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I.  INTRODUCTION 


In  the  past,  most  detection,  estimation,  and  control  problems 
were  studied  In  a linear  space  setting.  Vfhlle  the  linear  space 
aporoach  leads  to  simple  solutions  for  linear  systems,  no  effective 
synthesis  procedures  for  optimal  detection,  estimation,  and  control 
have  been  obtained  for  large  classes  of  nonlinear  systems. 

It  Is  only  natural  to  believe  that  a nonlinear  problem  Is  best 
studied  In  some  kind  of  a nonlinear  space.  Among  all  possible 
nonlinear  spaces  it  Is  only  natural  to  start  with  a space  which 
Is  locally  linear,  on  which  a differential  calculus  can  be  used, 
and  which  has  a group  structure  for  us  to  utilize.  Such  a nonlinear 
space  does  exist  In  mathematical  literature  and  Is  called  a Lie 
group.  In  fact,  it  was  Invented  by  Sophus  Lie  to  study  nonlinear 
differential  equations.  The  theory  of  Lie  groups  has  been  well 
established  and  provides  us  with  a large  chest  of  geometric  and 
algebraic  tools. 

In  addition  to  the  mathematical  nicety,  the  Lie  groups  are 
natural  state  spaces  for  many  nonlinear  problems  of  practical  Impor- 
tance. Notable  examples  are  the  rotation  groups,  which  are  the 
state  spaces  for  frequency  demodulation,  gyroscopic  analysis,  and 
satellite  attitude  estimation  and  control.  Other  examples  can  be 
found  In  power  conversion,  nuclear  reactor  control,  and  compartmental 
model  study  In  blosclence,  etc. 

Recent  years  have  seen  many  useful  and  Interesting  results  on 
detection,  estimation,  and  control  problems  with  Lie  group  structures 
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Most  of  these  results  are  facilitated  by  the  rich  geometric  and 
algebraic  structures  which  are  inherent  in  these  problems  and  are 
made  clear  only  in  a Lie  group  setting.  The  reader  is  referred  to 
[1]“[3],  from  which  most  related  articles  that  are  not  in  the 
reference  list  of  this  chapter  can  be  traced.  This  chapter  is  not 
intended  to  be  a survey  of  the  development  of  what  is  now  called  the 
geometric  approach.  We  will  rather  restrict  our  attention  mainly 
to  estimation  and  detection  and  some  closely  related  Issues. 

In  contrast  to  the  linear  theory,  the  continuous-time  and  the 
discrete-time  systems  on  Lie  groups  are  very  different  in  nature. 

The  approaches  to  their  estimation  and  detection  problems  are  thus 
very  different  and  have  been  developed  on  the  bases  of  two  separate 
ideas.  The  idea  for  continuous-time  systems  is  that  of  "rolling 
without  slipping."  The  idea  for  discrete-time  systems  is  the  use 
of  the  exponential  Fourier  densities. 

The  continuous-time  systems  on  a Lie  group  that  correspond  to 
the  linear  systems  on  a linear  space  are  bilinear  in  form.  In 
fact,  the  bilinear  systems  can  be  viewed  as  Induced  by  the  linear 
systems  through  "rolling  without  slipping."  Furthermore,  "rolling 
without  slipping"  can  be  shown  to  be  an  "almost  sure"  bijective  mapping 
between  the  bilinear  systems  and  the  linear  systems.  It  is  known 
that  the  local  study  of  a Lie  group  is  entirely  equivalent  to  the 
study  of  the  finite-dimensional  linear  algebraic  structures  of  the 
associated  Lie  algebra.  "Rolling  without  slipping"  does  Indeed 
facilitate  similar  simplification  in  studying  estimation  and  detection. 


The  exponential  Fourier  densities  have  been  used  to  derive  finite- 
dimensional  optimal  estimation  schemes  for  many  discrete-time 
systems  on  compact  Lie  groups.  This  is  made  possible  mainly  by  the 
closure  property  of  the  exponential  Fourier  densities  of  any  given 
finite  order  under  the  operation  of  taking  conditional  distributions. 
Another  reason  for  using  exponential  Fourier  densities  is  that  any 
continuous  or  bounded-variation  probability  density  on  a compact  Lie 
group  can  be  approximated  as  closely  as  desired  by  such  a density. 

Most  of  these  ideas  can  be  clearly  Illustrated  o the  unit  circle, 
the  simplest  compact  Lie  group.  The  circle  is  also  the  natural  state 
space  for  many  estimation  and  detection  problems  of  practical  importance 
such  as  frequency  and  phase  demodulation  and  slngle-degree-of-f reedom 
gyroscopic  analysis.  Therefore  a detailed  theory  of  estimation  on  the 
circle  will  be  presented  in  the  next  three  sections.  No  knowledge  of 
Lie  groups  is  required  to  understand  them.  Estimation  and  detection 
on  general  Lie  groups  are  studied  in  the  last  two  sections.  The 
required  definitions  and  theorems  from  the  Lie  theory  are  briefly 
summarized  there. 

Although  the  two  sections  on  general  Lie  groups  and  the  three 
sections  on  the  circle  can  be  read  Independently,  an  understanding  of 
the  circle  case  can  definitely  help  understand  the  problems  and  the 
results  on  general  Lie  group.  The  main  references  for  Sections  II-VI 
are  [4]-[8]  respectively.  Section  V is  the  only  section  that  contains 


some  new  results. 


This  chapter  Is  not  Intended  to  exhaust  all  existing  results 
on  estimation  and  detection  problems  with  Lie  group  structure. 

The  Interested  reader  is  referred  to  [9]-[17]  for  some  of  these 
results  beyond  this  chapter. 


II.  PROBABILITY  ON  THE  CIRCLE. 


There  are  many  fundamental  differences  between  the  estimation 

and  detection  problems  on  Euclidean  spaces  and  those  on  Lie 

groups.  In  order  for  some  readers  to  appreciate  them,  this  section 

will  be  addressed  to  some  probabilistic  elements  on  the  circle.  The 

probability  distribution  function  and  the  characteristic  function  on 

the  circle  will  first  be  briefly  introduced. 

One  of  the  main  concerns  in  this  chapter  Is  to  study  how  one 

uses  the  knowledge  of  the  probability  distribution  of  a random  variable 

taking  values  on  a Lie  group  to  determine  an  estimate  of  the  random 

variable  that  minimizes  a certain  error  criterion.  The  conventional 

least  squares  technique  cannot  be  used  here.  Let  us  take  the  circle 

2 

as  an  example.  The  square  error  of  the  angles  0“  and  359®  Is  (359  )®, 
whereas  by  geometrical  Intuition  they  are  only  1®  apart.  In  Sub- 
section II. 3 we  will  look  Into  this  Issue  on  the  circle  In  detail. 

The  Importance  of  the  normal  probability  densities  cannot  be 
overemphasized  for  estimation  and  detection  on  Euclidian  spaces. 
Unfortunately,  there  does  not  exist  an  analogous  density  on  the 
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circle  that  possesses  all  the  nice  properties  of  the  normal  density. 

In  fact,  the  nice  properties  of  the  normal  density  are  almost  equally 
divided  between  two  contenders  for  normalcy,  the  folded  normal  density 
and  the  circular  normal  density.  It  turns  out  that  while  the  folded 
normal  density  Is  natural  to  use  for  continuous-time  estimation,  the 
circular  normal  density  Is  more  suitable  for  discrete-time  estimation. 
They  will  both  be  discussed  and  compared  in  this  section. 


II.  1.  The  Distribution  Function. 


A point  on  the  unit  circle  can  be  represented  by  either  the 
angle  0e(-Tr,Tr)  it  makes  with  a fixed  reference  point  on  the  circle 
or  by  the  2x2  orthogonal  matrix 


exp  R6 


[cos  9 sin  9 *1 
-sin9  cos  9J 


I + 9R 


where  the  matrix  R 


Is  called  the  Infinitesimal  rotation 


matrix.  The  addition  of  two  angles  9^  and  ^2  o^odulo  2x,  denoted  as 
9^  • 9^,  corresponds  to  the  multiplication  of  the  two  matrices  repre- 
senting the  points. 

Let  9 be  a random  variable  taking  values  on  S^>(-Tr,n].  The 
distribution  function  F of  9 can  be  defined  on  [-ii,ir|]by  the  equation 


r 


F “ P(-'n<0_^6 . This  function  F is  usually  extended  to 


the  whole  real  line  by  the  equation 

F (6j^  + 211)  - F (6j^)  “ It  The  function  F defined 

this  way  is  called  the  distribution  function  (d.f.)  of  6 on  the 
circle. 

Given  two  points  6^  and  0^  on  S^,  we  denote  by  arc  (6^^162^ 
the  set  of  points  from  0^  to  02  in  the  counter-clockwise  direction 
with  0^  excluded  and  02  included.  It  follows  that 

P (0e  arc  (^^t  ~ ^ (®2^  “ ^ There  is  a natural 


projection  from  to  defined  by  x 


0 = X mod  2v . Let 


0j^  = Xj^  mod  2v  and  02  = X2  mod  2-n.  It  can  be  shown  that 

P (0e  arc  (0j^,02)  = (P(x2)  - F(x^))  mod  1.  We  note  that 
the  d.f.  F is  a right  continuous  function,  but  in  contrast  with  d.f.'s 
on  the  real  line, 

lim  F(0)  «=  “ , lim  F(0)  = -00. 

6-K»  0-K-oo 


If  the  d.f.  F is  absolutely  continuous,  it  has  a probability 
density  function  (p.d.f.)  f such  that 


/„  f(6)  d0  •=  F (0  ) - F (0,).  A given  function  f is  the  p.d.f. 

01  2 1 

of  an  absolutely  continuous  distribution  if  and  only  if  (i)  f(x)^o, 

1 ‘ 
xcR^,  (ii)  f(x  + 2ti)  = f(x),  (ill)  I f(x)  dx  = 1. 

—IT 
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II.  2,  The  Characteristic  Function. 

Another  representation  of  Is  as  the  set  of  complex  numbers 

1.0 

of  unit  length.  Any  such  number  can  be  uniquely  written  as  e , 

0e(-n,n].  If  0 Is  a random  variable  taking  values  on  (-ir,Tr],  then 
z = exp  10  Is  a random  variable  taking  values  on  the  unit  circle  In 
the  complex  plane,  which  will  also  be  denoted  S^. 

The  characteristic  function  of  0 (or  z)  is  the  function  i|f  defined  to 

Integers,  t=0,  + 1,  ± 2,  ...,  by  \|r(t)  = E exp  (ite)  = exp(it  e)dF(  0) 

where  F(e)  is  the  circular  d.f,  of  6.  Obviously  \|i(0)  = l,iKo)  = 1, 

♦(-p)*  The  expectations,  a(t)  = E(cost9)  = Rei|r(t)  and 

P(t)  = E(8int9)  = Imi|;(t),  are  called  the  t-th  order  sine  and  cosine 

00  2 2 

moments  respectively.  If  E (a  (t)  + 0 (t))  is  convergent,  the  random 

t=  1 

Variable  0 has  a density  which  is  defined  almost  everywhere  by 


' 2^  ^ (-it0) 

:i 

The  joint  c.f.  of  two  circular  random  variables  0^  and  02  is 
f defined  by  i|i(t,s)  = E exp  1 (t0j^+  s02)  where  t and  s are  Integers. 

Let  the  c.f.'s  of  0^^  and  02  be  and  Then  0^^  and  6^ 

are  Independent  if  and  only  if  i()(t,s)  = ’('j^(t)  Furthermore, 

if  the  p.d.f.'s  of  0^^  and  02  exist,  then  their  convolution  is  the 
p.d.f.  of  0j^  • 02*  i.e. 

(2n)‘^E*^(t)  ♦2(t)  exp  (-itS). 


The  standard  distance  function  on  the  circle,  the  distance  p 
between  two  points  on  the  circle,  is  the  arc  length  of  the  short 
path  joining  them.  If  we  restrict  6^^  and  02  to  take  values  in  the 
range  (-tTjTt],  we  have 

p(ei,02^  = min(  |0i-02  I ,2Tr- 
The  class  of  error  criteria  we  wish  to  consider  is  the  class 
of  symmetric,  nondecreasing  cost  functions — i.e.  functions  <t>:S^->R 
which  satisfy 

0 1 4.(0)  = 4'(-0) 

0 1 p(0^,O)  1 p(02,O)  « 4>(0j^)  1 4'(02)-  (1) 

Some  examples  of  cost  ciiteria  satisfying  (1)  are  p(0)=  p(0,O), 

2 2 

(1  - COS0) , p(0)  , (1  - COS0)  . We  also  wish  to  consider  the  special 
class  of  unimodal,  mode-symmetric  probability  density  functions — i.e., 
density  functions  of  the  form  p:S^-»[0,”)  with  a unique  maximum  at  n, 
such  that 

p(n  + 0)  = p(ri  - 0)  V 0. 

As  the  following  theorem  demonstrates,  under  these  conditions 
the  mode  of  the  density  is  the  optimal  estimate. 

1 

Theorem  1;  Given  an  error  function  4 that  satisfies  (1)  and  a uni- 
modal, mode-symmetric  probability  density  function  p,  then  the 


estimation  error  is  minimized  at  the  mode,  l.e.. 


E((|)(e-n))  1 E(4i(e-a))  Va 

where  p has  Its  maximum  at  n< 

Proof : The  theorem  follows  Immediately  from  results  on  similarly 
ordered  functions  and  the  rearrangement  Inequalities.  The  basic 
result  for  real  valued  functions  defined  on  is  contained  in  [18] 
(thm.  378)  and  [19,  p.  183].  The  result  for  Is  obtained  by 
making  only  minor  changes  In  these  proofs. 

We  remark  that  from  the  symmetry  of  the  problem,  (fi  has  its 
global  maximum  at  it  and  p has  Its  global  minimum  at  n ir.  Thus 

E((J>(0-n+n))  _>  E(<|)(0-a))  V a.  ■ 

It  should  be  noted  that  Theorem  1 Is  the  analog  of  a 
result  of  [20],  [21].  Note  that  the  same  result  Is  true  If  no 
probability  density  exists  but  the  probability  measure  is 
unlmodal  at,  and  symmetric  about, some  point  n,  l.e.,  the  d.f.  F is 
convex  for  (-it,0]  and  if  F(0)  • l-F(-0)  at  each  continuity  point 
of  F. 

Let  us  now  restrict  our  attention  to  the  error  function, 

()>(0)  =*  1-cos  0.  This  function  was  used  widely  in  statistics  [4] 

and  was  used  in  [13]  to  design  a phase-tracking  system.  It  Is 

especially  interesting,  because  locally  It  Is  a quadratic  function, 
2 

l.e.  l-cos0crl/20  for  0<<1.  Let  0 denote  the  optimal  estimate  of 
the  random  variable  0 on  with  respect  to  the  error  criterion 


1 
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E(l-cos(  6-9))  . As 

TK9,9)  - E(1-co8(  e-^))=  1-[e  cose,  E 8lnelcos0,  sine]  ^ , 

the  optimal  estimate  6 is  determined  by 

cos  0 = ^ E cos  0 , 

P 


sin  0 “ ^ E sin  0 , 

P 

2 2 k 

p »=  [(Ecos  9)  + (E  sin  6)^]’. 

We  note  that  the  complex  number  i(i(l)  defined  in  Subsection  II. 2 
can  be  expressed  as  p exp  16.  This  number  is  called  the  resultant 
of  0.  In  analogy  to  the  linear  space  case,  the  optimal  estimate  0 
is  called  the  circular  mean  of  0,  and  the  estimation  error  n = 1-p 
is  called  the  circular  variance. 


I 


II.  4.  Folded  Normal  Densities. 

Given  a random  variable  x on  with  d.f.  F^,  the  random 

variable  0 - x mod  2ir  on  the  circle  has  the  d.f.  F defined  by 
00 

F(0)  - E (F  (9+2iik.)-F  (2nk-n)),  6e(-u,Tt]. 

k— CO  ^ ^ 

This  can  be  viewed  as  obtained  from  wrapping  F^  around  the 
circumference  of  the  unit  circle.  If  x has  a p.d.f.  p^(x), 
the  corresponding  p.d.f.  of  6 is 
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p(0)  = Z p (&f2kir). 

k=-a.  * 

Corresponding  to  a normal  density  p^,  the  folded  normal  density 

00 

F(6;f1,Y)  ~ 1 S exp  [-  ^ (0-iT*-2kir)^]  ( 

/2vy  k«-“  2y 

plays  a central  role  In  the  continuous-time  estimation  problem 
considered  In  Section  IV.  The  Fourier  series  representation  of 
the  folded  normal  density  Is 


F(0;n,Y)  = 1^+1  E exp  (-k^)  cos  k(0-n).  (2b) 

2ir  TT  k=l  2 

From  this  representation  It  Is  easy  to  see  that  the  convolution  of 
two  folded  normal  densities,  F(0;nj^,Y-]^)  end  F(0;n2»Y2^*  folded 

normal  density  F(0;nj^  ♦ n2»  Yj|^  +Y2  )•  More  Important  properties  of 
the  folded  normal  density  will  be  studied  In  the  following  [6]. 


Theorem  2;  The  folded  normal  density,  (2a)  and  (2b),  Is  unlmodal 
with  mode  at  0 = n and  Is  symmetric  about  n. 


Proof ; Since  cos  ^ 1,  the  second  form  of  F In  (2b)  yields 

00  2 

F(0;n,Y)ljL  ^ e“"  = F(n;ri,Y)- 

2t\  h n“l 

Thus  F has  its  global  maximum  at  0 ~ n> 


Since  F(0,n,Y)  “ F(0-n;O,Y)»  we  need  only  show  that  F(0;O,y) 
Is  symmetric  about  0 and  monotone  decreasing  as  p(0,O)  Increases. 


Synmetry  Is  obvious  (cos  n0  ■■  cos  n(-0)),  and  monotonlclty  will  follow 


If  we  can  show 

IF  (0;O,y)  < 0 

90 


0e(O,n) 
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il  (0;O,y)  > 0 0e(-ii,O)  (3b) 

30 

We  now  remark  that  the  properties  of  F(0;O,y)  have  been  studied 
extensively,  since  it  is  a theta  function.  See  [22]  and  [23]  for 
discussions  of  some  properties  of  theta  functions.  Using  the  notation 
of  [22,  pp.  2,  42],  we  have  “ 

F(0;O,y)  = iL  64  ( ll)  = k n + 2q^""^cos  0+  , (4) 


2ti 


2ti 


n“l 


where  q = g~Y/2 

fl  aV"). 

n=l 

Using  the  fact  that  F>0  and  the  form  of  F given  by  (4)  we  have 


F"\e;0,v)^(9;0.Y)  = - 


L 

n=l 


. o 2n-l  . ‘ 

(1  + 2q  cos  6 + q 


It  is  easily  seen  that  the  term  in  square  brackets  on  the  right  hand 
side  of  (5)  is  positive  for  all  values  of  0 and  thus  (3)  is  correct. 

Some  work  along  these  lines  has  been  done  in  [53].  See  [23]  for 
discussions  of  other  relevant  properties  of  theta  functions,  hyper- 
geometric functions,  Legendre  polynomials,  and  Tchebycheff  polynomials. 

Note  that  the  symmetry  requirements  of  Theorem  1 are  necessary. 

For  instance,  if  is  not  symmetric,  the  mode  of  the  density  need  not 
be  the  optimal  estimate  even  if  all  the  other  assumptions  of  Theorem  1 
do  hold.  As  an  example,  consider  the  function  ^ : S^  > R 


(5) 
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0 _<  0 TT 

-IT  ^ 0 J<  0. 

Suppose  our  distribution  Is  the  folded  normal  centered  at  0.  Then 
It  can  be  shovm  that  the  mode,  0,  Is  not  the  optimal  estimate. 

Theorem  3.  Let  satisfy  the  second  requirement  of  (1)  and  let 
p(0)  - F(0;n,Y).  Then  E(|>(9-T1))  is  an  increasing  function  of  the 
variance,  y — that  is 

d_  E((K0-n))  ^ 0.  (6) 

dy 

Proof ; Writing  ^ 

(t(0)  = d + 1 c sin  n0  + d cos  n0  ) 

o n n ' 

n = 1 

and  using  the  results  on  Fourier  series  analysis, 

E((K0-n))  = + E d e“"  , (7) 

o 1 n 

n-1 

but  we  get  the  same  error  if  we  compute  E(i()(0-n)),  where  i|)  is 
the  symmetrized  function 

which  also  satisfies  (1).  Thus,  it  is  enough  to  prove  the  theorem  for  ^ 
satisfying  (1).  In  this  case  T)  the  optimal  estimate  and 
E(^(e-Ti))  - /♦(e-n)F(0,Ti,y)d0 

- 7/(e)F(0;O,Y)d0 

TT 

- 2 /Q*(0)F(0;O,Y)d0. 
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Then,  (6)  will  hold  if 

L <K0)  3_  F(0;O,y)  de  0 
9Y 

Suppose  we  can  show  that  there  exists  eQe[0,n]  such  that 
~ F(0;O,y)  < 0 0e(O,0Q) 

^ F(0q;O.y)  = 0 

9^  F(0;O,y)  > 0 ee(eQ,7r]. 

Then,  since 

•Ke)  <.  <(1(6^)  ee[O,0Q] 

♦ (e)  ^ 0E[eQ,7r] , 

we  have 

^0  ^(®;O.Y)d0  > -J-Oq)  F(0;O,Y)de 

dY 

= (^(0q)  ±_  (1/2)  - 0, 

dY 

and  we  get  a strict  inequality  if  ^ is  not  a constant. 

Now  it  is  easy  to  see  that 

3 2 

F(0;O,y)  = 1/2  ^ F(0;O,y) 

30^ 

and  the  theorem  will  be  proved  once  we  prove  the  following  lemma,  which 
yields  more  information  about  the  shape  of  the  folded  normal  density. 
Lemma  1;  For  an  arbitrary  but  fixed  value  of  y > 0,  there  exists 
0Qe[O,Tr]  such  that 
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ii,  F(e,;0,Y) 

2 ^ 

_30 > 

F(e^;O.Y) 


or 


F(e^;0,Y) 


> 0 


— 2 F(0  ;0,Y) 
30 

F(eQ;0,Y) 


0 


and  the  lemma  and  the  theorem  are  proved. 

Note  that  by  symmetry  we  have  that  F has  a unique  Inflection  point 
at  -0Q  on  the  interval  [-n.Ol. 

Theorem  3 tells  us  that  the  intuitive  notion  that  we  "have  more 
accurate  information"  for  smaller  values  of  y can  be  made  precise. 

Also,  this  theorem  implies  another  result,  which  is  the  analog  of  a 
problem  treated  in  [24].  The  problem  treated  in  [24]  is  that  of 
finding  the  optimal  linear  filter  minimizing  an  asymmetric  error  criterion 
on  that  decreases  on  (-",0]  and  increases  on  [0,®).  The  result  is 
that  the  optimal  linear  filter  is  the  minimum  variance  filter,  and  the 
proof  essentially  consists  of  showing  that  the  expected  error  cost  is  an 
increasing  function  of  the  variance.  Theorem  3 clearly  implies  an 
analog  of  this  result. 

Some  examples  of  cost  criteria  satisfying  (1)  and  the  associated 
optimal  costs  when  the  density  is  folded-normal  are  given  in  the  following 
theorem,  of  which  the  proof  is  simple  and  is  therefore  omitted. 


Theorem  4.  Let  p(6)  “ F(6;ti,y).  Then 
(i)  E(l-cos(e-ri))  =>  l-exp(-^) 

(11)  E(l-cos(0-ri))^  “ 2 ^~2^  (■2y) 
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(ill)  E(p(e-Ti))  = I 7^,2  exp(-i{2k+l)^v) 

TT  k-0 


<»  k+1 

(iv)  E(p^(0-n))  = ^ -4  I L - exp  (-k^Y)  ] 

3 k=l  'JL 


II.  5.  Circular  Normal  Densities. 

The  multistage  estimation  for  discrete-time  systems  on 
involves  two  operations  alternately  that  are  convolution  and  conditioning 
(l.e.  taking  conditional  distribution).  While  the  class  of  folded  normal 
densities  is  closed  under  convolution,  unfortunately  it  is  not  closed 
under  conditioning.  The  difficulty  involved  in  using  folded  normal 
densities  for  discrete-time  estimation  was  discussed  in  [6 3 and  [25]./ 

The  difficulty  is  partially  resolved  [5]  if  another  class  of 
"normal"  densities  on  S^,  is  used.  These  densities  are  called 
circular  normal  densities  and  have  the  form 

G(e;ri,Y)  = 1 exp  Y cos  (B-n), 

27rI„(Y) 

where  TqCy)  is  the  modified  Bessel  function  of  the  first  kind  and  order 
zero,  i.e., 

OD 

I.(Y)  » Jq  (Y/2)^‘"/(k!)^ 


The  circular  normal  density  was  first  introduced  by  Langevln  [26]  in  1905  and 
by  Von  Mlses  [27]  in  1918  in  the  context  of  statistical  mechanics.  In 
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contrast  to  the  folded  normal  densities,  the  class  of  circular  normal 
densities  is  closed  under  conditioning  rather  than  under  convolution  [5]. 
More  will  be  said  about  this  in  the  next  section  after  the  circular 
normal  density  is  generalized  to  the  exponential  Fourier  density. 

The  class  of  normal  densities  on  an  Euclidean  space  has  the 
closure  properties  under  both  convolution  and  conditioning,  which  accounts 
for  the  success  of  the  Kalman-Bucy  filtering  for  the  discrete-time 
systems.  Now  the  folded  and  the  circular  normal  densities  divide  these 
two  properties  between  them.  Which  one  then,  is  more  "normal”  than  the 
other?  We  recall  that  the  linear  normal  density  has  two  characterizations 
— the  maximum  likelihood  characterization  and  the  maximum  entropy  char- 
acterization. It  was  observed  by  Von  Mises  [27]  and  Mardia  [4]  respec- 
tively that  the  circular  normal  density  has  both  characterizations  on 
the  circle.  However,  the  Browiian  motion  on  the  circle.  Induced  by  that 
on  the  real  line  through  "rolling  without  slipping"  (See  Section  IV), 
and  a variant  form  of  the  central  limit  theorem  on  the  circle  (See  [4]) 
both  lead  to  the  folded  normal  density.  Further,  the  Independence  of 
p(6j^)  and  p(02)  and  p(6j^)-p  (6^)  f where  p is  an  arbitrary  function  and 
and  02  are  Independent,  also  leads  to  the  folded  normal  density  (See 
[4]).  Therefore,  there  may  be  no  answer  to  the  above  question.  Before 
we  start  the  next  section,  let  us  have  a few  words  about  the  shape  of 
the  circular  normal  density. 

The  circular  normal  density  G(0;n»Y)  is  obviously  unimodal  and 
symmetric  about  the  mode  n.  The  ratio  of  the  density  at  the  mode  to 
that  at  the  antimode  q + it  is  given  by  exp  2y  so  that  the  larger  the 


-19- 


value  of  Y»  the  greater  is  the  clustering  around  the  mode.  It  can  be 

shown  by  straightforward  calculation  that  the  function  G(e;0,Y)  has 

2 1/2 

two  inflection  pointSiat  + arc  cos  [-  ^ tl  + (1  + Y /4)  ] . 

III.  DISCRETE-TIME  ESTIMATION  ON  THE  CIRCLE. 

y 

Estimation  for  discrete-time  systems  on  the  circle  was  studied 
in  [6]  and  [25],  using  both  folded  normal  densities  and  Fourier  series 
representations  of  probability  densities.  The  optimal  estimation 
equations  obtained  therein  are  infinite-dimensional  and  cumbersome. 
Although  some  numerical  simulation  has  been  done  on  the  suboptlmal 
equations  obtained  through  truncating  the  higher  order  terms,  it  is  not 
clear  whether  these  equations  have  satisfactory  performance  in  general. 

As  a matter  of  fact,  the  "dimension"  of  the  optimal  estimation 

j equations  derived  from  using  the  folded  normal  densities  increases  very 

I 

rapidly  in  time.  When  Fourier  series  are  used  to  represent  proba- 
! bllity  densities,  the  application  of  Bayes'  rule,  which  Involves  the 

i 

multiplication  of  two  a priori  densities,  has  the  effect  of  spreading 
the  dominant  Fourier  coefficients  into  the  higher  order  terms.  Obviously, 
this  dilemma  becomes  compounded  in  a multistage  estimation  problem  when 
a sequence  of  multiplications  of  Fourier  series  takes  place. 

In  this  section,  we  will  present  an  alternative  approach.  The 
approach  is  based  on  a new  class  of  probability  density  functions  which 
have  the  form 

exp  I I (a.  cos  k x + b 

[k-0 

Such  a density  will  be  called  an  exponential  density  of  order  n,  to  be 
denoted  by  EFD(n).  We  note  that  the  circular  normal  density  introduced 


I 
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in  the  previous  section  is  exactly  the  EFD(l). 

III«  1.  Exponential  Fourier  Densities  on  the  Circle. 

There  are  two  reasons  for  using  the  exponential  Fourier  densities. 

It  is  obvious  that  the  multiplication  of  two  EFD(n)'s  does  not  raise 
the  order  of  the  densities.  Thus  the  class  of  n-th  order  exponential 
Fourier  densities  is  closed  under  the  operation  of  taking  conditional 
distributions. 

Another  reason  for  using  EFD's  is  that  any  continuous  or  bounded 
variation  function  can  be  approximated  by  an  EFD  as  closely  as  desired 
with  respect  to  the  square  integral  nons.  This  property  enables  us  to 
use  an  EFD(n)  as  a mathematical  model  of  any  probability  distribution 
on  the  circle.  Both  this  and  the  aforementioned  closure  properties  can 
be  generalized  to  compact  Lie  groups  and  some  homogeneous  spaces,  as  will 

be  seen  in  Section  V.  

Before  we  illustrate  how  the  EFD's  are  used  to  deduce  finite- 
dimensional,  closed-form,  and  recursive  equations  to  update  the  condi- 
tional densities  of  the  signal  given  the  observation,  we  will  now  state 
the  approximation  property  in  the  following  theoi?»"  of  which  a general 
version  for  compact  Lie  groups  will  be  proven  in  Section  V. 

Theorem.  Let  p be  a continuous  probability  density  on  S^.  For  any 

given  positive  number  e,  there  exists  an  exponential  Fourier  density, 
n 

P (x)  “ exp  T,  (a,  cos  kx  -f  b sin  bx) , such  that 
k-0  “ ^ 

£^(p(x)  - p^(x))^  dx  e. 


i 
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III.  2.  A Basic  System  on  S . 

Assume  that  the  signal  and  the  measurement  processes  are  governed 
by  the  equations 


n+1 


E ® W 

“n  n 


m = s ® V 
n n n 


where {w  } is  a given  deterministic  process  on  S , and  {v  } is  a white 
n n 

random  process  on  S^.  The  probability  densities  of  s^^  and  v^  are  assumed 

to  be  the  following  Independent  exponential  Fourier  densities: 

N 

p(s  ) = exp  E (a  , cos  ks,  + b , sin  ks  ) 
o , _ ok  J ok  o 

k=0 

N 

p(v  ) = exp  E (a  , cos  kv  + 3 , sin  kv  ). 
n , „ nk  n nk  n 

k=0 

By  Bayes’  rule, 

P(Sn+il”‘"‘'^  ) = ‘^n+lP^VJ'n+l^P^^n+l  1“"^ 

with  c = l/p(m  Im'^)  = a normalizing  constant.  It  can  be 
n+1  n+1 ' 

easily  shown  that  the  conditional  densities  on  the  right  can  be  written 
as  the  following  exponential  Fourier  densities: 


p(s  m ) = exp  E (a  , cos  ks  + b , sin  ks  ) (9) 

*^  n'  ’^.-  nk  n nk  n 

k=0 

N 

p(8  “ exp  E (a  , cos  k(s  - w ) + b sin  k(s  , ~ w ))  (10) 

n+l'  „ nk  n+1  n nk  n+1  n 


N 

‘Vl.k""  “‘"n+l  - %+l> 

k-0 

n+l,k  '^^“n+l  " "n+1^^’ 

where  a , and  b , are  to  be  determined.  Substituting  these  two 
nk  nk 

equations  into  (3), 
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n+1. 


N 


k=0 


-b  , sin  kw  + ot  . , , cos  km  ., 
nk  n n+l,k  n+1 


+6  sin  km  cos  ks  + (a  , sin  kw 

n+l,k  n+1  n+1  nk  n 


+ b , cos  kw  + a ,,  , sin  km 
nk  n n+l,k  n+1 


-e  , cos  km  ) sin  ks  .1 
n+l,k  n+1  n+1  . 


Thus,  we  obtain  the  following  recursive  formulas  for  a^^^  and  b^j^ 
in  turn,  give  us  the  desired  conditional  densities  pCs^Jm’^): 


a ,,  , = a , cos  kw  - b , sin  kw 
n+l,k  nk  n nk  n 


+ a cos  km  . , + P , sin  km  , 

n+l,k  n+1  n+l,k  n+1 


b , = a . sin  kw  + b , cos  kw 
n+l,k  nk  n nk  n 


■f  sin  CO, 


N 

I 

k=0 


■ '"P  <%+l,k  'O®  '‘‘o+l  ‘■wl.k  “»  ''Vl 


for  k • 1,  2,  ...,  and  where  a . is  a normalizing  constant. 

n+1 ,0 


which. 
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III.  3.  A Phase-Shift-Keyed  System. 


Consider  the  signal  and  the  measurement  processes  governed  by 
the  equations 
s 


n+1 


s # w 
n n 


m 


cos  (oJt  + s ) + V 
n n n 


where  {w  } Is  a given  deterministic  process  on  S and  {v  } Is  a white 
n n 

2 

Gaussian  sequence  with  zero  mean  and  variances  . The  probability 
density  of  s^^  ja  assumed  to  be  the  exponential  Fourier  density 
N 

p(Sj^)  » exp  r cos  ks^  + b^^^^  sin  ks^^). 

k=0 

We  note  that  the  measurement  process  {m  } can  be  viewed  as  a 

n 

sampled  sinusoidal  wave  modulated  by  a random  phase  process  {s  } and 

n 

corrupted  by  additive  white  Gaussian  noise  {v^}.  The  special  case 
of  this  model  where  pCs^^)  is  a first-order  exponential  Fourier  density 
has  been  solved  In  (13].  Here  again,  by  Bayes'  rule  and  straightforward 


calculation,  we  have(8-10) .As  v is  a Gaussian  random  variable,  it 

n+i 


follows  that: 


o, 


exp 


n+1 


r (m^i  - vi))^-! 


(11) 


Substituting  (11)  and  (10)  Into  (8)  yields 
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, I n+l.  c N 

P^®n+l'“  ^ exp  I [ cos  k(s^^,  - w^) 


'^S+1 


k=0 


n+1  n 


+ b , sin  k(s  . , - w ) 
nk  n+1  n 


^ ^"n+l  - ^Vl  ^“"n+1  %+l^ 


2o 


n+1 


+ cos  + s^^p)  ] 


“n+l 


vZti  O . 


n+1 


exp  { Z *5_,,  sin  kw_) 


k=0 


n nk 


n nk 


n+1 


+ “n+l 


, cos  wt  ,,  cos  s 
/ n+1  n+1 

n+1 


- “n+l 

"2  ‘"^n+1  ®n+l  - 

n+1  4a^ 

n+1 


cos  2s^^^  + 1 sin  2iot  . , sin  2 


n+1 


4a 


®n+l  ■ ^ n+1 


n+1 


2a 


n+1 


Thus  we  obtain  the  following  recursive  formulas  for  a and  b 

nk  nk 

in  turn,  give  us  the  desired  conditional  densities  p(s  I m*^) ; 

n* 


eos  - b sin  w + '°n+l  cos  wt 


-* L-l  1 » W U , 

n+1,1  nl  n nl 


n+1. 


n+1 


‘’n+l,!  “ \l  ''n'*'  '^nl  ''n  “ 

‘^^+1 


n+1. 


cos  ks  . , 
n+1 


which, 


"1 


Vl.2  “ ^n2  2”n2-  ^2"^" 


b__i.i  T = a_„  sin  2w_  + b „ cos  2w  + 1 sin  2a)t  .. 

n+i,2  n2  n n2  n — = n+1; 


and,  for  k = 3,4,...,  recursively 


^ J.1  1 “ 3 1 cos  kw  - b , sin  kw 
n+l,k  nk  n nk  n, 


b , ” a , sin  kw  + b , cos  kw 
n+l,k  nk  n nk  n. 


P(®«4.i  ® ^ ^ J.1  I,  cos  ks  + b . , , sin  ks  . , ] , 

n+1  n+l,k  n+1  n+l,k  n+1 


where  a 


n+1,0  is  a normalizing  constant. 


ITI.  A.  Periodic  Measurements  in  Additive  White  Gaussian  Noise 


Consider  the  signal  and  measurement  processes 

s . , = s ® w 
n+1  n n 

m = h(s  ) ■•■  V 
n n n 

where  {w  } and{v  } are  as  in  the  previous  section  and  where  h is  a periodic 
n n ^ 

function  with  a period  of  2Tf. 

The  periodicity  of  h allows  us  to  approximate  it  by  a finite 
Fourier  series,  as  closely  as  we  wish  in  the  space  of  square-lntegrable 
functions.  In  other  words,  for  any  e>0,  there  exist  ^ 

positive  Integer  M such  that 
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|h-  h„l 


< e 


where 


M 

hj^(s):  = r (fj^  cos  ks  + sin  ks). 
k=0 


(12) 


Without  loss  of  generality,  we  may  assume  that  N ^ 2M  in  (12), 

for  otherwise  we  can  set  a,,  = b,,  = 0,  for  N < k < 2M,  and  write 

Ik  Ik  — — 

2m 

p(s^)  = exp  ks^).  We  can  also  assume  that 

f„  = 0,  for  otherwise  f.  can  be  incorporated  into  m . Assume  that 
0 0 n 

p(s  |m”)  = exp  (a  , cos  ks  + b , sin  ks  ) . By  Bayes'  rule  and 

n'  k=0  t'k  n nk  n 

straightforward  calculation,  we  obtain 


n+1' 


_^n±l — exp  [ - — 1 (Tn^^j_  _ j (f  cos  ks  +g  sin  ks  )) 


/2ir  a 


n+l 


2a 


n+l 


k=0 


+ I (a^j^  cos  k(s^^^  - w^)+  b^j^  sin  k(s^j-  w^))] 


k=0 


(13) 


We  note  that  the  function  in  the  above  bracket  can  be  written  as  a finite 

Fourier  series  of  order  N in  the  variable  s , . This  shows  by  induction 

that  for  all  n • 1,2,...,  p(s^|m'^)  is  an  exponential  Fourier  density  of 

order  N;  the  recursive  formulas  for  a , and  b , can  be  straightforwardly 

nk  nk 

obtained  from  (13).  However,  the  formulas  are  tedious  and  will 


not  be  displayed  here. 
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IV.  CONTINUOUS-TIME  ESTIMATION  ON  THE  CIRCLE. 


A signal  process  and  an  observation  process,  Caking  values  on  S^, 
will  be  formulated  In  terms  of  bilinear  Ito  matrix  differential  equations. 
The  conditional  probability  distribution  of  Che  signal,  given  observations 
over  a certain  period  of  time,  will  be  evaluated.  Recursive  computational 
schemes  for  optimal  estimation  (filtering,  smoothing,  and  prediction), 
with  respect  to  the  error  criteria  defined  In  Subsection  II. 3,  will  be 
derived.  In  fact  it  will  be  shown  that  optimal  estimates  on  S^  can  be 
obtained  recursively  by  the  use  of  an  ordinary  vector  space  estimator 
together  with  a nonlinear  preprocessor  and  a nonlinear  postprocessor. 
Multichannel  estimation  on  abelian  Lie  groups  will  be  examined.  Examples 
Illustrating  the  optimal  estimation  procedure  are  given  at  the  end  of 
this  section. 

IV.  1.  Signal  Processes  and  Observation  Processes 

2 

Consider  the  situation  of  a unit  circle  In  R with  a line  tangent 
to  it. 

We  allow  the  line  to  perform  a one-dlmenslonal  continuous  transla- 
tion (along  itself);  fix  the  center  of  the  circle  and  require  that  there 
be  no  slipping  at  the  point  of  tangency.  The  line  then  Induces  a rota- 
tion of  t*  e circle  and  If  the  line  moves  a distance  x the  circle  rotates 
X radians  and  so  is  x mod  2tt  B radians  away  from  its  initial  orientation. 


r 


1 
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This  method,  called  "rolling  without  slipping",  will  now  be 
used  to  construct  a continuous  signal  process  on  and  to  formulate 
the  mathematical  model  of  a sensor  (i.e.,  an  observation  process)  to  bu 
used  in  this  report. 

We  will  adopt  the  following  notation 
(i},A,P)  = a probability  space 
s = a positive  real  number 

g 

Cj^  = the  family  of  real-valued  continuous  functions, 

a,  on  [0,8]  such  that  a(0)  = 0 
= the  Borel  o-field  of  C® 

g 

C2  = the  family  of  2 x 2 orthogonal-matrix-valued 

continuous  functions,  A,  on  [0,sj  such  that  A(0)  is 
the  Identity  matrix  I , 

B2  = the  Borel  a-fleld  of 


g 

Lower  case  letters  denote  elements  in  and  upper  case  letters  denote 
s 

elements  in  C2. 

Let  J:C^  -*■  C®  be  defined  by 

(J(a))(t)  = exp(a(t)R)  - j"  cos  a(t)  sin  a(t)  j 

-sin  a(t)  cos  a(t)  J (14) 

for  a £C°  and  te[0,s].  It  is  easily  seen  that  J is  B®-measurable  and 
bijective.  A point  on  the  unit  circle,  S^,  can  be  represented  by  either 
the  angle  0e[-iT,TT)  it  makes  with  a fixed  radial  axis  or  the  2x2 
orthogonal  matrix  exp(Re).  Therefore,  in  the  first  representation,  C®  is 
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the  family  of  piecewise  continuous  functions  e(t),  such  that  at  any 
point  of  discontinuity  the  right  hand  limit  of  0 is  + tt,  while  the 
left-hand  limit  is  + tt. 

Each  continuous  curve  a(t)  on  gives  rise  to  one  and  only  one 

piecewise  continuous  curve  0(t)  lying  between  n and  -ir,  of  which  the 
continuous  segments  are  obtained  by  translating  the  corresponding 
segments  of  a(t)  by  an  integral  multiple  of  2tt  . Conversely, 

g 

each  piecewise  continuous  curve  in  C2  gives  rise  to  one  and  only  one 
continuous  curve  taking  values  on  which  is  obtained  simply  by 
piecing  the  continuous  segments  together.  This  intuitive  observation 
illustrates  the  bijective  property  of  the  operator  J.  Thus,  a continuous 
random  signal  process  on  which  is  described  by  an  ^-measurable 

g 

function  X;fi-+C2  corresponds  to  a continuous  random  signal  process  on 

1 s 

R which  is  described  by  an  i4-measurable  function  such  that 

X(t)  = (J(x))(t),  te[0,s]. 

g 

We  now  nefine  a random  process  by  the  K.  Ito  random 

differential  equation, 

dz(t)  = m(x(t),t)  dt  + q^^^dw(t),  z(0)  - 0, 

where  miR^^  x is  Borel-measurable,  q:Rj^-Wl^  is  positive  and 

measurable  and  w is  the  standard  Brownian  motion  on  (n,i4,P),  inde- 
pendent of  X.  Let  C2  be  defined  by 

Z(t)  - (J(z))(t). 

Applying  Che  Ito  differentiation  rule,  we  obtain  the  following  Ito 
matrix  differential  equation: 
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-m(t)  -q(t) 

2 


-q(t)  in(t)  ( ro  dw(t)7 

dZ(t)  = Z(t)  2 I dt  + Z(t)  (15) 


-dw(t) 


Z(0)  = I, 


where  m(t)  = m(x(t),t)  and  the  diagonal  terms  are  the  second 

order  correction  terms  which  keep  Z on  the  circle.  This  equation  is 
the  mathematical  model  of  the  sensor  to  be  used.  We  note  that  the 
input,  x(t)  to  the  sensor  is  not  the  dynamical  state  X(t)  of  the 
rotational  signal  process  on  the  circle,  but  rather  the  angle  the 
rotational  process  has  swept. 

The  physical  motivation  for  this  sensor  model  comes  from  the  fact 
that  in  observing  a rotational  process  (for  instance  a gyroscope 
recording  rotation  about  a fixed  axis)  our  measurement  contains  infor- 
mation on  the  total  rotation,  x(t),  not  just  the  orientation,  X(t). 

In  some  applications,  such  as  the  gyro  problem  mentioned  above,  we  wish 
to  extract  knowledge  of  orientation  from  knowledge  of  rotation,  so  it 
is  proper  to  regard  X(t)  as  the  signal  process.  However,  in  other 
applications,  such  as  FM  demodulation,  our  interest  centers  on  the  x 
process,  and  in  these  cases,  we  may  regard  x as  the  signal. 


IV.  2.  Conditional  Probability  Distributions 


In  this  subsection,  we  will  derive  equations  for  the  conditional 
probability  distribution  of  the  signal  process  given  observations  over 
some  time  period.  The  approach  of  this  section  is  measure-theoretic 
in  nature,  and  the  major  results  are  summarized  in  the  statements  of 
Lemna  2 and  of  Theorem  3 and  its  two  corollaries. 
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Let  us  denote  {z(t) , Te[0,t]  } and  {Z(t),  Te[0,t] } by  and 

Z^,  respectively.  We  note  that  Z^  » J(z^).  Since  J Is  bljectlve 

from  cj  to  C2,  the  o-subfleld  of  A generated  by  z^  is  the  same  as 

that  generated  by  Z^.  In  other  words,  the  information  carried  by  z*" 

and  Z^  is  the  same.  That  a-subfleld  will  be  denoted  by  ^ , The 

z 

o-subfield  of  A which  is  generated  by  X = X(X)  (the  subscripts  X,s,t 

A 

denote  that  the  processes  are  evaluated  at  these  times)  will  be  denoted 
by  A . 

Let  be  the  conditional  probability  measure  on  given 

/!*■,  defined  by  P (A,u„)  » P(A  jA*”)  (u,) , for  AeA  , Let  P be  the 


conditional  probability  measure  on  (n,j4^)  given  A^,  defined  by 

P “ P(b|/1  )(u,  ),  for  BEj4^,  w,  efl.  The  restrictions  of  P to 

zx  1 ' X 1 z 1 

A^  and  A are  denoted  by  P and  P , respectively.  Let  p and  be  the 
Z X Z X z w 

t t t t 

measures  induced  on  by  z and  w , respectively.  Define  the 

conditional  measures  p on  (C^,b5),  given  X , by  p (B,a), ) » 

ZX  XX  A ZX  X 

P(z"^(B)  for  BebJ^ 

It  is  known  [28]  that  where^c:  denotes  equivalence 

of  measures , and  that 

^ - E re*"] 

dp 

w 

Here  means  taking  the  average  over  x.  Further, 


I - exp  (-1/2  ^ m2  (t  )dr  + ^ m (T)d  C (x)) 


where 


0*^  = exp  (-1/2  J'q  ^(T)dT  + m (t)  dz(T,u),)).  (16) 

q q 

We  note  that  dP  (u-.o)  ) is  /l  x A -measurable.  Applying  a general 

ZX  Z X Z X 


Bayes  rule  from  [29],  we  obtain 


dP  . . dP  , - 

XZ  (u  ,W-)  = ZX  (oj-.U  ). 

dP  ^ dP  ^ 

X z 

Let  us  denote  the  family  of  2 x 2 orthogonal  matrices  by  and  the  set 

of  induced  Bor  el  sets  by  Bq.  Let  be  the  conditional 

measure  on  (Mq.Aq)  given  A^,  defined  by  = P(X~^(A)  |>l^)  (0)2) , 


for  AESQ,a)2efi.  Let  be  the  measure  on  (M^.s^)  Induced  by  X^.  Then 


it  is  easily  seen  that 


*^^xz  (X  (u. ),  z‘^((ia,))  ^^xz  (u  ,u,)  E (e^|x  = X (u,)) 

dv  '‘“dP  ^ = * A L_i 

E (e^) 

X 

where  0^  is  defined  by  (16).  Summarizing  what  has  been  shown,  we 
have  the  following  lemma. 


Lemma  2;  Consider  the  observation  process  described  by  (15).  The 
conditional  probability  measure  v , for  the  signal  X.  given  the  observation 

XZ  \ 

Z is  then  absolutely  continuous  with  respect  to  v^,  the  a priori  measure 
for  Xj^.  Further,  for  Z^  e and  X « M^,  one  has 
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dv 


xz 


d V 


(X,  z‘)  - Ej^(e'[Xj-x) 
E^(e') 


where 

e*^  - exp  (-1/2  /m^(T)dT+  / m(r)  [Z'(T)dZ(T)]  ) (17) 

oq  0 q 

with  [Z’(T)dZ(T)]^2  “ Z’(T)dZ(T)[  5 ]. 

If  the  density  function  of  v exists  and  is  denoted  by  p (•)» 

then  it  follows  from  Lemma  2 that  the  density  function  P (‘Iz^)  of 

-X 

V exists  and  can  be  expressed  as  follows: 
xz 


p (xjz*") 
*x 


- X)  P^  (X) 
^X 

Ex^e') 


where  0*^  is  defined  by  (17).  Let  x e be  defined  by  exp  Rx  = X and 
-n^  X < ff.  Then  by  simple  calculations. 


(xlz*^)  - ± k=i,  2,  ...)  p (X)/  E (eb 

X Jix  ^ 


00 

- Z E^(e*^|x(X)  = X + 2kx)  p^x+2kx)/E^  ( 9*^)  , 
k“-«>  X 


where  p denoteip  the  density  function  of  x(X).  This  completes  the 


proof  of  the  following  theorem. 
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Theorem  5;  Consider  the  observation  process  described  by  (15) . If 

the  density  function  p of  X(X)  exists,  then  the  conditional  density 

~X 

function  p ('Iz  ) exists  and  can  be  expressed  as  follows 

03 

p (x|z*^)  = Z p (x+2kTr|z*^) 
iSx  k=-»  X 

00  E (e*‘|x(X)  = x+2kTr)p  (x+2kTr) 

= Z h 

k=-“  E (0*^) 

X 

where  0^  Is  defined  by  (17),  p denotes  the  density  function  of  x(X) 

*X 

and  X Is  determined  by  exp  Rx  = X and  the  condition  -Trsx<TT  . 

It  Is  appropriate  to  remark  that  one  can  easily  derive  the 

stochastic  partial  differential  equation  for  the  conditional  density 

p.  (xlz*^)  using  Theorem  5 and  the  well-known  equation  ([31], [32]),  for 
-X 

p (x+2kTr  [z*")  , -m  < k < “.  For  economy  of  space,  this  equation  will 
~X 

not  be  displayed.  However  we  remark  that  when  m(x,t)  is  periodic  in  x 

with  period  2tt,  the  equation  is  in  a form  similar  to  the  Stratonovich- 

Kushner  equation  with  p replaced  by  p 

*X  '^X 

Using  Theorem  5 and  the  well-known  fact  [30]  that  the  smoothed 
and  the  predicted  densities  can  be  expressed  explicitly  in  terms  of 
filtering,  we  can  easily  obtain  the  following  two  corollaries. 

Corollary  1:  The  conditional  smooth  density  p (x|z^),  for  t-  ^ X ^ t, 

X\  u 

may  be  expressed  jn  terms  of  the  conditional  filtered  density  as  follows: 
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t t 2 

p (Xlz*")  = E (x+2k7r|Z^)exp  (/.  %_  dl  -1/2  /_J|s_ds) 

-X  k=—  *X  S(s)  \(s) 

where  x is  determined  by  exp  Rx  = X,  -TT^xSTT  and 

dig  = [Z' (s)dZ(s)]^2  “ m(s)ds 

a = m(s|x  = x)  - m(s) 

S A 

m(s)  = £’(m(s)  |z®) 

m(s|x  = x)  = £’(m(s)  |z®,  x = x) . 

A A 

Corollary  2;  Let  X be  a Markov  process  with  given  transition  density 

p (X|x(t)  = ^) . The  conditional  predicted  density  p (X|Z^),  for 
~X  3x 

tp  _5,  ^ i ^ expressed  in  terms  of  the  conditional  filtered  density 

as  follows: 

■hoo 

p (x|zS  = / P^  (X|x(t)  = C)p^ 

-n^A  -oo  t 

IV.  3.  Optimal  Estimation. 

In  the  previous  subsection,  the  conditional  probability  distributions 
were  studied.  A variety  of  estimation  problems  may  be  studied  based  on 
those  conditional  distributions,  but  some  estimation  problems  on 
the  circle  can  be  solved  directly  by  using  results  in  vector-space 
estimation  theory.  In  this  subsection,  the  well-established  linear  optimal 
estimation  theory  will  be  used  to  deduce  recursive  equations  for  optimal 
estimation  on  and  thereby  illustrate  the  approach. 
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The  estimation  problem  with  which  we  will  mainly  be  concerned  in  this 
subsection  is  the  following:  Given  a symmetric  cost  function  ^ defined 
hy  (1),  construct  a 2 x 2 orthogonal  random  matrix  X(x|t)  as  a 
B^-measurable  functional  of  such  that  for  allG  -measurable  2x2 

1 z 

orthogonal  random  matrices  M one  has  the  inequality 

ECfCXCX),  X(X|t))|z‘^)  E(4CC(X),  M)|z‘^),  (18) 

in  which  ^CXj^jX^)  = 4i(6),  6 being  determined  by  exp  R0  = and 

the  condition  -tt  s 6 < rr  (i.e.,  9 is  the  angle  between  X^^  and  X2). 

We  have  seen,  at  the  beginning  of  this  section,  that  a continuous 
random  process  X on  can  be  identified  with  a continuous  random  process 
X on  via  the  bijective  mapping  X = J(x).  We  now  construct  a signal 
process  X on  by  injecting  a linear  diffusion  x into  S^,  x satisfying 

1/2 

dx(t)  = a(t)x(t)  dt  + b (t)  dv(t),  x(0)  = 0 

where  b(t)  > 0,  V teT,  and  v is  a standard  Brownian  motion.  Independent 
of  the  observational  noise  w.  Applying  the  stochastic  differentiation 
rule,  we  obtain  the  following  stochastic  differential  equation  for  our 
signal  process  X = J(x): 

dX(t)  = - 1/2  b(t)X(t)dt  + X(t)R{a(t)[yQ(exp  aMdi) 

b^^^(s)dv(s)]dt  + b^^^(t)dv(t)}  (19) 

X(0)  - I 

where  we  note  that  x(t)  = / (exp  f*” a(v)dT)b^^^(s)dv(s) . 

0 “0 

The  observation  process  to  be  used  in  this  subsection  is  taken  to 
be  Z,  satisfying  the  stochastic  differential  equation: 


(20) 


-q(t)  c(t)x(t) 

0 dw(t) 

dZ(t)  = Z(t) 

2 

dt 

+ Z(t) 

-c(t)x(t)  -q(t) 

2 

-dw(t)  0 

Z(0)  - 1. 


As  shown  in  Subsection  IV. 1,  Z can  be  Identified  with  z-J  ^(Z)  satis- 
fying 

dz(t)  - c(t)x(t)dt  + q^^^(t)dw(t) 
z(0)  = 0 

Note  that  the  equations  for  X and  Z are  each  bilinear  in  form. 

Moreover,  z*"  and  Z^  generate  the  same  o-subfield  A in  (fi.A.P).  Hence 

z 

E(x(X)  |/1^)  is  both  a 5^-measurable  functional  f^^  of  z^  and  a 
measurable  functional  £2  of  Z^  with 

fzCZ*")  - fj^(J“^(Z*')).  (21) 

Let  x(X|t)  denote  fj^(z^)  = e(x(X)|z*')  and  f2(Z^)  ■ E(x(X)|z*') 

respectively. 

We  will  first  study  the  filtering  problem,  where  o * t.  Then 
the  Kalman-Bucy  linear  filtering  theory  yields  immediately 

dXt|t  ■ a(t)Xj.|^dt  + K(t)c(t)q~^(t)  (dz(t)  - c(t)Xj.|^dt) 


where  K is  the  solution  of  the  Rlccati  equation 
K(t)  - 2a(t)K(t)  - c^(t)q"^(t)K^(t)  + b(t) 
K(0)  - 0. 


-38- 


In  view  of  (21),  we  obtain  the  following  lenma,  which  not  only  leads 
to  the  solution  of  the  above  stated  filtering  problem  but  also  applies 
directly  to  optimal  frequency  demodulation  [6]. 

Lemma  3;  Let  the  stochastic  process  (19)  be  the  signal  process  and 
the  stochastic  process  (20)  be  the  observation  process.  Then  the 
filtering  equations  are 

dx(t|t)  “ a(t)x(t|t)dt 

+ K(t)c(t)q~^(t)  ([Z'(t)dZ(t)]^2  -c(t)x(t|t)dt) 

x(0|0)  = 0 

with  K(t)  - 2 a(t)K(t)  - c^(t)q'^(t)K^(t)  + b(t) 

K(0)  - 0 

and  the  conditional  probability  density  Is  given  by 

p (x|Z*^)  - 1 exp  [-  1 (x-x(t|t))^]. 

t K(t!)  2K(t) 

In  view  of  Theorem  5,  we  see  that  p (X|Z^)  Is  a folded  normal 

~t 

density.  By  Theorem  2,  It  follows  that  p (X|Z^)  Is  unlmodal  with 

~t 

A 

mode  at  exp  [x(t|t)R]  and  Is  symmetric  about  It.  We  may  now  conclude 
from  Theorem  1 that  for  a cost  function  defined  by  (1), 

E(*(X(t),  exp  [x(t|t)R])|z‘^)  < E(«(X(t),  M)|z‘=) 

for  any  /i^-measurable  2 x 2-dlmenslonal  orthogonal  random  matrix  M. 

A 

Since  exp[x(t|t)R]  Is  easily  seen  to  be  a £^-measurable  functional  of 
Z^,  It  follows  that  the  optimal  estimate  of  our  signal  process  Is 
X(t|t)  - exp  [x(t|t)R]. 

Differentiating  this  with  respect  to  t yields 
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dX(c|t)  = j K^(t)c^(c)q  ^(t)X(tlt)dt+X(tlt)R((a(t)  - K(t)c^(t)q“^ (t) ) 


-1 


-1, 


[ / (exp  V*'(a(T)-K(T)c^(  T)q  ^(T))dT)K(s)c(s)q  ^ (s)  [Z  ' ( ^ 

0 s 


+ K(t)c(t)q"^(t)[Z'(t)dZ(t)]^2>- 


Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 
Theorem  6;  If  the  signal  process  X and  the  observation  process  Z on 
satisfy  the  following  stochastic  differential  equations: 


dX(t)  = 


b(t)X(t)  dt 


+ X(t)R(a(t)  [J’''(exp  j a(  T)dT)b^^^(s)dv(s)dt  ] + b ( t )dv( t ) ) 
0 s 


1/2 


X(0)  = 1 
dZ(t)  =»  Z(t) 


-aOj. 

2 


-c(t)  [ :f  X'(s)dX(s)  ] 


+Z(t)! 


r 0 


0 

dw(t) 


12 


c(t)  I :f  X' (s)dX(s)l  ^2 
0 

2 


dt 


-dw ( t ) 0 ( 

Z(0)  = I 

where  w and  v are  independent  standard  Brownian  motions  on  R^,  then 
the  optimal  estimate  X(t|t)  in  the  sense  of  (18)  satisfies  the  following 
stochastic  differential  equations: 


dX(t|t) 


« - I K^(t)c^(t)q‘\t)X(t|t)  dt  + 

X(t|t)R((a(t)  - K(t)c^(t)q"^(t))  [ i (exp  / (3(1)  - 

0 s 

K(T)c^t)q"^(T))dT)  • K(s)c(8)q"^(s)  tZ'(s)dZ(s)]j2l  ‘it 


(22) 


+ K(t)c(t)q  ^t)  [Z'(t)  dZ(t))j2) 
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K(t)  = 2a(t)K(t)  - c^(t)q"^(t)K^(t)  + b(t)  (23) 

K(0)  = 0 

The  conditional  probability  density  is  given  by 

00 

P (x|z*^)  = - Z exp  [-  1 (x+2kTr  -x(t  |t))^] 

~-t  /27r  K(t)  k=-<»  2K(t) 

00 

= 1 + ^ I exp  [-k^K^(t) ] cos  k(x-x(t|t)) 

2ti  tt  k=l  2 

where  x is  defined  by  exp  Rx  = X and  —n  _<  x < tt. 

Some  of  the  expected  errors  E(  4>(X(t)  ,X( t 1 1) ) ) of  the  optimal 

A 

estimate  X(t|t)  can  be  obtained  immediately  from  Theorem  4. 

We  note  that  the  optimal  filtering  equations  (22)  and  (23)  are 
complex  in  form.  Conceptually,  however,  the  filtering  procedure  is 
quite  simple:  The  observation  process  dZ  first  goes  through  a nonlinear 
transformer.  The  transformed  process  fZ’dZ]j^2  8°®®  through  a 

Kalman-Bucy  linear  filter.  Then  we  inject  the  filtered  process 
x(t|t)  into  S via  the  injection  mapping  J.  The  output  X(tlt)  of  the 
nonlinear  injector  is  the  desired  estimate. 

The  same  approach  can  be  used  to  solve  the  smoothing  and  predic- 
tion problems.  The  solution  to  the  prediction  problem  is  trivial  and 
hence  omitted  here.  For  the  smoothing  problem,  we  first  recall  [33] 
that  for  0 A _<  t, 

C s 

1 1“  ^a|X  ^ ^ ^ (a(T)  - K(T)c^(T)q  ^(T))dT)c(s) 

X X 

.q  ^(8)(dz(8)  - c(8)x^|^ds). 


By  (21),  it  follows  that 
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x(x|t)  = x(Xlx)  + K(X)  4 (exp  / (a(T)  - K(T)c^(T)q"^(T))dT 

X X 

.c(8)q"^(s)( [Z* (8)dZ(8) ■ c(8)x(s |8)ds) . (2^) 

We  note  that  the  conditional  probability  distribution  of  x(x) 

given  Z*"  is  Gaussian.  From  Theorem  5,  it  follows  that  p (xjz*")  is  a 

~X 

folded-Gaussian  density  and  hence  unimodal.  As  in  the  filtering  case, 

X(X|t)  = exp  (x(x|t)R).  (25) 

Substituting  (24)  into  (25)  thus  yields 

A tZ  S 

X(x|t)  = X(X|X)  exp  {RK(X)  i (exp  / (a(T)  - K(T)c^(T)q"^(T)dT) 
C(s)q"^(s)  ([Z'(s)dZ(s)]^2  " /[X'(TjT)dX(TlT)]j2‘^®^^ 

. s , 

where  we  have  used  the  identity  x(s|s)  = / [X' ( t | T)dX( t | t) ] , , . 

0 

Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 
Theorem  7;  If  the  signal  process  and  the  observation  process  are 
the  same  as  in  Theorem  5,  then  the  optimal  estimate,  X(x|t), 

0 ^ X _<  t,  in  the  sense  of  (52),  is  given  by 

- t s o _1 

X(xlt)  = X(X|X)  exp  {RK(X)  i (exp  / (a(T)  - K(T)c''(T)q  (T))dT) 

X X s 

c(s)q"^(s)  ([Z’(s)dZ(s)]^2  " ^ [X'(T|T)dX(T|T)]^2‘^®^>* 

where  X(t|t),  K(t)  can  be  obtained  from  (22)  and  (23). 

The  conditional  probability  density  of  X(X)  given  Z*",  the  expected 

A 

errors  £($(X(X) ,X(X | t)) , the  stochastic  equations  for  X(x|t)  for  fixed- 
point  smoothing,  fixed-lag  smoothing,  and  fixed  interval  smoothing  can 
all  be  easily  obtained  by  straightforward  computations  which  are  left 


to  the  interested  reader 
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IV.  4.  Random  Initial  State. 


In  the  previous  subsections,  the  initial  state  of  the  signal 
process  X has  been  assumed  to  be  X(0)  = I,  the  identity  matrix.  This  is 

obviously  not  a practical  assumption  in  some  applications.  In  this 
subsection  we  will  consider  the  case  in  which  the  initial  state  is 
a random  variable.  We  will  denote  the  signal  process  by  Y in  this  sub- 
section, and  assume  that  Y(0)  = Y’  is  a random  variable  independent  of 
the  observational  noise  w. 

We  observe  that  the  input  to  the  observation  process  (15)  at 

time  it  is  not  the  dynamical  state  of  the  signal.  It  is  the  angle  that 

the  rotational  process  represented  by  the  signal  has  swept  over  the 

time  interval  [0,t].  Taking  this  viewpoint,  our  present  problem  can  be 

solved  through  the  previous  ideas  with  some  modification. 

Let  y(t)  denote  the  angle  through  which  the  signal  Y has  swept  during 

[0,t3.  It  is  easily  seen  that 
t 

y(t)  = I ; Y’ (s)dY(s)] 

0 

Define  a rotational  process  X by 

X(t)  = Y“^Y(t) 
o 

Then  X(0)  = I and,  as  before,  we  may  define 

x(t)  = (j  ^x))(t)  = [ ; x’(s)dx(s)]  . 

0 

Note  that  x(t)  = y(t).  In  other  words,  the  angles  swept  by  X and 
by  Y over  [0,t]  are  the  same.  Hence  (15)  can  also  be  used  as  the 
observation  process  for  our  present  problem.  The  conditional  distribution 
of  X(X),  given  observation  of  the  form  given  in  (15),  can  be  determined 


by  application  of  the  previous  results. 
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We  note  that  and  X(X)  are  conditionally  independent  given 
Z*".  If  the  distribution  of  and  the  conditional  distribution  of 
X(X)  given  Z^  are  both  folded  normal,  then  the  following  lemma  easily 
leads  to  the  conclusion  that  Y(\|t),  the  optimal  estimate  of  Y(X)  given 
Z ,1s  equal  to  Y^X(A|t).  Here  Y^  is  the  mode  of  the  distribution  of  Y^ 
and  X(x|t)  is  the  mode  of  the  conditional  distribution  of  X(X)  given 


Lemma  4 ; Let  A and  B be  two  independent  random  2x2  orthogonal  matrices 
each  of  which  has  a folded  normal  distribution  with  modes  A and  B 

respectively.  Then  AB  is  a random  2x2  orthogonal  matrix  which  has 

A 

a folded  normal  distribution  with  mode  AB. 

Proof . It  is  easily  seen  that  there  exist  unique  real-valued  normal 
random  variables  a and  b such  that  Ea,  Eb  are  in  [-TT,tt)  with  A=  exp  Ra,  and 
B » exp  Rb.  Then  AB  = exp  R(a+b) . Obviously  a+b  is  a normal  random 
variable.  Hence  AB  is  folded  normal  and  the  mode  of  AB  is  exp[RE(a+b)]  = 
exp  [RE(a)]*  exp  [RE(b)]  = AB. 

IV.  5.  Multichannel  Estimation. 

The  results  of  the  previous  subsections  can  be  extended  to  the  large 
class  of  problems  involving  processes  evolving  on  abelian  Lie  groups. 

It  is  well  known  [34]  that  a given  connected  abelian  Lie  group  G is 
isomorphic  to  the  direct  product  of  a number  of  copies  of  the  circle 
and  a number  of  copies  of  the  real  line,  i.e. 

G«.r"  X (sl)”* 
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where  (S^)™  is  usually  called  a "torus."  The  diffusion  processes  on 
this  type  of  space  have  been  used  to  model  some  interesting  satellite 
and  pendulum  systems  in  [54],  Analogous  to  (14),  a bijective  mapping 

J X (C^)™  is  defined  by 

nm  1 12 

(J  (a))(t)  = [a  (t),...,a  (t),  (J(a  )) (t) , . . . , (J(a  _))(t)] 
nm  in  n+1  n+m 

for  a e a^  being  the  ith  component  of  a.  Thus  a continuous 

random  signal  process  on  G which  is  described  by  an  4-measurable 

function  X:i}  ->•  x (C^)***  corresponds  to  a unique  continuous  random 

signal  process  on  which  is  described  by  an  A-measurable  function 

x:ft  -*-(0^)”*^  such  that 

X(t)  = (Jnn,(x))(t),  te[0,S]. 

The  mathematical  model  for  the  sensor  can  be  obtained  by  first 
using  to  inject  the  following  vector  random  differential  equation 

into  R x(S  ) 

dz(t)  = m(x(t),t)dt  + dv(t)  (26) 

z(0)  = 0 

and  then  differentiating  Z(t)  = (J  (z))(t)  by  the  stochastic  differ- 

nm 

entiation  rule  to  obtain  a set  of  stochastic  differential  equations.  The 

first  n of  these  equations  are  the  same  as  the  first  n equations  of  (26) 

and  the  last  m equations  are  bilinear  2x2  matrix  differential  equations 

having  the  form  (15).  This  calculation  is  straightforward  and  so  we  will 

not  display  the  resulting  equations.  Because  of  the  bijective  property  of 

J , it  is  clear  that  the  estimation  analysis  in  the  previous  subsections 
nm 

can  be  easily  generalized  to  this  general  abelian  case  with  little  modifi- 
cation. For  the  special  case  in  which  x is  a linear  diffusion  and 
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m(x(t),t)  is  a linear  function  of  x(c),  what  has  been  shown  is  that  the 
applicability  of  the  celebrated  Kalman-Bucy  filter  includes  estimation 
on  abelian  Lie  groups. 


V.  DISCRETE-TIME  ESTIMATION  ON  COMPACT  LIE  GROUPS 

The  results  of  Section  III  can  easily  be  generalized  to  problems 
on  compact  non-abelian  Lie  groups  by  introducing  a similar  exponential 
Fourier  density  (EFD)  on  the  group.  This  density  is  obtained  by  using 
a sequence  of  irreducible  unitary  representations  which  form  a complete 
orthogonal  system  on  the  compact  group.  It  can  be  shown  that  a conti- 
nuous density  function  on  the  group  can  be  approximated  as  closely  as 
we  wish  in  the  space  of  square  integrable  functions  by  such  an  EFD. 

As  with  the  circle  case  a consequence  of  the  group  structure  is  that 
the  class  of  EFD's  of  a certain  finite  order  on  the  compact  Lie  group  is 
closed  under  the  operation  of  taking  conditional  distributions.  It  will 
become  clear  in  the  sequel  that  it  is  exactly  this  closure  property  of 
the  EFD's  that  yields  simple  estimation  schemes  in  which  the  sequential 
conditional  densities  are  updated  by  recursively  revising  a fixed  finite 
number  of  parameters. 

In  order  to  illustrate  how  the  conditional  density  can  be  used  to 
calculate  the  optimal  estimate  on  the  group,  a rigid  body  attitude  esti- 
mation problem  is  solved  as  an  example.  The  error  criterion,  the  optimal 
estimate,  and  the  estimation  error  with  respect  to  the  criterion  will 
be  discussed  for  a given  probability  distribution. 
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V.  I.  Compact  Lie  Groups  and  Their  Matrix  Represeatations . 

We  begin  by  summarizing  some  definitions  and  preliminary  results 
to  be  used  in  this  section.  The  reader  is  referred  to  r35]-r37]  for 
details . 

Def init ion.  A differential  manifold  M of  dimension  n is  a Hausdorff 
topological  space  with  the  following  properties:  (a)  For  every  element 
meM  there  are  an  open  set  U containing  it  and  a homeomorphism  y:U-*VCR'^, 
called  a chart.  The  set  V is  called  a parameter  domain.  The  components 
of  vector  y(m)  are  called  the  coordinates  of  m,  (b)  For  any  two  charts, 
yj^  and  y^,  defined  on  and  U2,  the  composition  y2°y|^^  defined  on 
^1^'^1^^2^  is  smooth  (i.e.,  infinitely  differentiable). 

Definition.  A Lie  group  G is  both  a differential  manifold  and  a group, 
which  is  closed  and  connected,  such  that  the  group  operations  are  smooth 
in  coordinates.  If  the  group  is  covered  by  finite  number  of  bounded  para- 
meters domains 'through  their  charts,  then  the  group  is  said  to  be  compact. 
Def init ion.  An  m x m matrix  representation  of  a Lie  group  G is  a sub- 

group r of  the  nonsingular  m x m matrices  together  with  a homomorphic 
smooth  mapping  D of  G onto  F.  That  is,  for  each  a,  beG,  there  is  an 
element  D(a)ersuch  that  (a)  D(a)D(b)  = D(ab) , (b)  D(e)  = I,  and 
(c)  D(a  ^)=[D(a)]  We  write  dim  D = m.  The  representation  is  said  to 

be  unitary  if  each  matrix  in  F is  unitary.  Two  such  representations 
1 2 

D and  D are  called  equivalent  if  there  is  a nonsingular  m x m matrix  (It 
1 2 

such  that  tit  D (a)  = D (a)  t|i  for  each  aeG.  A reducible  representation 
is  one  that  is  equivalent  to  the  block  form. 
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D(a)  = r"  D^(a)  C(a)  ' 

[ 0 D^(a)  - , 

1 2 

where  D and  D can  be  shown  to  constitute  representations.  If  a repre- 
sentation is  equivalent  to  such  a block  form  with  C = 0,  it  is 
called  completely  reducible.  It  can  be  shown  that  a reducible  unitary 
representation  must  be  completely  reducible. 


Definition.  We  delete  from  some  of  the  parameter  domains  their  inter- 
section with  others  so  that  the  points  of  the  resulting  domains  are  in 
1-1  correspondence  with  the  group  elements.  Then  the  integral 
/f(x)w(x)dx  of  the  function  f with  respect  to  the  weight  function  w is 
well  defined.  It  can  be  shown  that  a weight  function  w,  unique 
except  for  a normalizing  factor,  can  be  found  such  that  this  integral 
is  left  invariant,  e.e.,  J'f(p)w(p)dp  = J*f (ap)w( p)dp  for  any  continuous 
function  f and  any  group  element  a.  On  a compact  Lie  group  the  integral 
is  also  right  invariant  and  is  written  as  J'f(g)dg. 


1 2 

Theorem  8.  Let  D (a),  D (a),...,  be  a family  of  inequivalent  irreducible 
unitary  representations  of  a compact  Lie  group.  The  matrix  elements 
of  these  -epresentations  satisfy  the  orthogonality  relations 

1. 


(g)  (g)*  dg  = (/dg/dim  (D^)  ><5 «i,'5 . . 5 . 

Im  jn  ‘K  ij  tnn 
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Theorem  9.  (Peter-Weyl)  A continuous  function  on  a compact  Lie 
group  can  be  uniformly  approximated  by  a linear  combination  of  the 
matrix  elements  of  the  unitary  irreducible  representations  of 
the  group. 


V.  2.  Exponential  Fourier  Densities  on  a Compact  Lie  Group. 


1 2 

Let  us  denote  by  D , D , ....  a collection  of  irreducible, 
inequivalent,  and  unitary  matrix  representations  of  a compact  Lie 

group  G,  which  are  of  dimensions  n^,  n2^ respectively.  We 

define  an  exponential  Fourier  density  of  order  N,  to  be  denoted  by 
EFD(N) , on  G as  a probability  density  of  the  form 

p(a)  = exp  I I a. .D  .(a)  = 

jj=0  i,j=l 

N fl  f 0 „ 

exp  ( E E a D..(a)  + a ) 

M i.j=l 

where  a is  a normalizing  constant  and  all  other  coefficients  a., 
oo  ij 

are  arbitrary  complex  numbers.  The  double  summation  notation  above 

2 

will  be  abbreviated  by  E.  The  norm  of  a function  f in  L (G)  will  be 
denoted  by  ||f|l  = 0^  1 (g)dg) . 


Theorem  10.  Let  p be  a probability  density  on  a compact  Lie  group  G. 

Assume  that  p is  continuous.  Then  for  any  given  positive  number  e, 

i t 

there  exists  an  EFD,  = exp  E a^jD^j,  such  that  |lp-p,j||  < e. 


Proof  : Assume  that 
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inf{p(x)  : xeG}  = c > 0.  (27) 

This  assumption  will  be  removed  later.  We  note  that  f(x)  = Xn  p(x) 

is  then  well  defined  and  also  continuous  on  C. 

Since  G is  compact,  in  view  of  the  Peter-Weyl  Theorem, 

for  any  0 < 5 < 1 there  is  a linear  combination  of  , say 

mn 

f .=  i.  a’^"  (6)D'^  , such  that  ||f..-f||  < It  follows  that 

< 1 + llflL  = 

Define  a function  g:R^  ->•  by 
, . _ r exp  X,  x<M 

~ J ~ g(x)  = exp  min  jx,M| 

exp  M,  x>M 

and  an  operator  g on  the  set  of  real  functions  on  G by  (gu)(x)  = g(u(x)). 

It  is  obvious  that  g satisfies  the  Caretheodory  conditions  [38,p.20]  and 

- 2 2 
g transforms  every  function  in  L (G)  into  a function  in  L (G). 

By  Theorem  2.1  of  [38,p.22],  the  operator  g is  continuous.  Hence 

given  any  e>0,  there  exists  a 6 >0  such  that  if  | jf^-f | | <5,  then 

||ifg-if||  < e.  Then  jjexp  fg-pl!  = 

Now  let  us  remove  the  assumption  (27)  and  assume  that 
inf {p(x) :xeG}=0.  Let  e be  an  arbitrary  positive  number.  Set  e^=e2=e/2 

and  Pj^(x)  = p(x)  + e^^/V , where  V is  the  volume  of  G.  The  function  p^ 

satisfies  (27).  Hence  there  is  an  EFD(N),  exp  f , such  that  | | exp  f - 
P]^l  I £ ^ Minkowski  inequality. 


I I exp  f -p| I £ I I exp  f -p^l I + I |p^-p|  I 1 £2  + e2 


= G. 


(28) 


We  observe  that  c can  be  made  arbitrarily  small  by  setting  e sufficiently 
small.  This  observation  completes  the  proof  of  the  theorem. 
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V.  3.  Estimation  for  Processes  with  Noise  on  the  Lie  Group. 


Suppose  that  the  signal  process  s^^  and  the  measurement  process 
both  evolve  on  a compact  Lie  group  G and  are  related  by 

mk  = Vk  o Sk  (30) 

where  Vk  is  a noise  process,  also  evolving  on  G and  <?  denotes  the 
group  operation.  Our  reason  for  writing  v to  the  left  of  s is  to  be 
consistent  with  the  corresponding  matrix  equation  obtained  by  the  use  of  the 
orthogonal  matrix  representation.-  the  matrix  S representing  s is 
premultiplied  by  the  matrix  V representing  v to  obtain  M = VS. 

We  now  consider  a signal  process  Sk  which  is  governed  by  the 


equation 


®k+l  ° ° 


(31) 


where  w,  is  a sequence  of  known  elements  on  G.  If  s is  a random 
k o 

variable  taking  values  on  G,  an  interesting  estimation  problem  is  to 

find  an  effective  way  to  recursively  compute  the  conditional  density 

k A 

of  Sk  given  the  set  of  measurements,  m = {mj^,...,m^  , k=l,  2,... 

The  EFD's  introduced  previously  are  ideal  to  use  in  solving 

this  problem  on  many  compact  Lie  groups  such  as  the  three  dimensional 

rotation  group,  S0(3).  However,  for  a reason  to  be  discussed  later, 

e 

it  is  more  convenient  to  include  the  complex  conjugates  of  the 

I 

harmonic  functions  in  the  EFD(N).  Thus  an  EFD(N)  in  this  subsection 
will  be  a density  function  in  the  form 

p(a)  = exp  I 


1 
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Suppose  and  have  EFD(N)'s  (if  they  have  different 
orders,  we  can  let  N be  the  maximum  order  and,  by  inserting  zero 
coefficients,  make  all  densities  of  order  N)  which  are  described, 
respectively,  by 


P(So> 


P(v^) 


We  claim  that  if  the  conditional  probability  densities, 

p(s  [m*'^),  k=l,2,...,  are  all  EFD(N)'s,  then  we  need  only  keep 
^ N 

2 

track  of  a fixed  finite  number,  T,  n.  , of  parameters  for 

i=l  ^ 


= exp 

E(a 

(s„) 

+ 

o 

< 

II 

(s-)) 

(32) 

mn 

mn  ° 

mn 

mn 

o 

= exp 

1(3 

mn 

(v,  ) 
mn  k 

- Xk 
+ b 

mn 

£* 

mn 

(V^)) 

(33) 

updating  the  conditional  densities.  The  proof  is  by  mathematical 
induction. 

For  k = 0,  p(Sj^  |m*^)  is  obviously  an  EFD(N),  as  p(s^lm°)  = 

I k-1 

p(Sg).  Let  us  assume  that  the  conditional  density  P(Sj^  ^^jm  ) is  an 
EFD(N),  denoted  by 

/ I k— 1 . „ , If , k— 1 \_i_  2f,  k— lp.f^  . V V / o / \ 

p(s,  , m ) = exp  ECa  D (s  ) -I-  a D (s  ))  (34) 

k-1  mn  k-1  mn  nn  k-1 

mn 

I Ic 

We  will  now  show  that  p(Sj^|m  ) is  also  an  EFD(N)  and  at  the  same 

time  exhibit  a recursive  formula  for  the  Fourier  coef f lelclents 

Ifk  j 2fk 

a and  a . 

mn  mn 

From  (30),  (31)  and  the  group  property  of  G,  v^^  and 
can  be  expressed  as 
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\ ~ \°  \ ®k-l  ° Vl  ° \ 


Thus,  using  (34),  p(s  |m*^  is  an  EFD(N): 


k-1. 


k-1 

r m k-1  ^ 1 ^ 


2«,k-l  if  d‘*(»‘'  ) of* (s  )] 
+ a i =1  mi'  k-1  in  ' k 


mn 

n, 


expZ{[  { (s^)  + 


2«,k-l  t*  .1 


j=l 


jn  “jm 


D,  (w  ) ] D (s,  ) I 
im  k-1  mn  k ' 


(35) 


The  second  equality  holds  because  D is  a matrix  group  repre- 


sentation so 


D (gjog2)  = D^(g^)D^(g2). 


The  following  calculation  shows  that  PC^i^lsj^)  is  also  an  EFD(N) 


P(mjsk) 


exp 


= exp  Ed  I (sTh  + 

j=i  J™  k mn  k 


/ ,2ik 


l"'  »i*  ("k))  “if-'k)! 


j=l 


jn  jm  ' k''’  nm'  k^ 


(36) 


I 
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The  last  equality  holds  because  D is  a unitary  matrix  repre- 
sentation so  D'^(g  = j^D'^(g)J  , 

k 

We  note  that  the  complex  conjugates  Dl  are  included  in  the 

mn 

EFD(N)'s  in  this  subsection  just  to  ensure  that  the  above  expression 

be  an  EFD(N).  On  many  compact  Lie  groups,  the  complex  conjugates 

are  unnecessary.  For  Instance,  on  the  three  dimensional  rotation 
mn 

group  we  have  (g  = (-1)'”^'^  (g)  , m,n=-i ,-Jl+l , . . . , f,  where 

nriri  “O  ^ *in 

the  complex  conjugates  are  avoided. 

Substituting  (33)  and  (36)  into  the  Bayes  Rule,  we  obtain 


p(Sj^|m‘^)  = p(Sj^|m*'"^)  P(nij^|s^) 


n 


e k-l  c -1  2^k  •• 

= c exp  l{  [a,  * i)  + ^ ^ (^1,) 

k jn  jm  k-l  jia  jn  k mn  k 


which  is  an  EFD(N).  This  completes  the  proof  of  the  following: 


Theorem  11.  Let  the  signal  and  the  measurement  processes, 
on  a compact  Lie  group  G be  governed  by 


s,  and 
k 


c = w O G 

®k+l  k k 


"k  = \ 


Here  Wj^  is  a sequence  of  known  elements  on  C and  Vj^,  the  measurement 
noise  process , is  a sequence  of  Independent  random  variables  taking 
values  on  G.  Suppose  the  probability  densities  of  s^  and  Vj^  are 
EFD(N)’s  described  by  (32)  and  (33).  Then  for  k“l,  2,  ...,  the 
conditional  density  p(Sj^|m  ) is  an  EFD(N)  of  the  form 
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p(s  tn  ) = exp  Z(a  D (s  ) + a D (s  )) 

K mu  mn  k mn  mn  k 

The  coefficients  a^^^  and  a^^*^  are  determined  recursively 
mn  mn  ' 

by  the  formulas 

1 k _ l^,k-l  -1  ^ 4. 

a = r [a.  D (w  ) + b . D.  / m 

ja  j®  k-1  jm  jn 

2 k r 2ii,k-l  A*  , -1  , ^ . IJik  I 

a = I [a  D.  (w  ) + b.  D.  (m  ) ] 

mn  jn  jm  k-1  jm  jn  k 

and  a°^  is  a normalizing  constant, 
oo 


V.  4,  Estimation  for  Processes  with  Additive  Noise. 

In  this  subsection  we  will  consider  another  model  for  which  the 

estimation  problem  can  be  solved  using  EFD(N)’s.  Suppose  that  the 

signal  process  Sj^  evolves  on  a compact  Lie  group  G according  to  the 

equation  (31)  and  it  is  observed  with  additive  noise  v , , . 

through  the 

p-dimensional  vector-valued  measurement  process  m^^, 

“k  “ ^^®k^ 

Here  h is  a given  square- integrable,  p-dimensional  vector-valued  function 

on  G,  and  Vj^  is  a sequence  of  p-dimensional  Independent  Gaussian 

vectors,  each  having  zero  mean  and  with  covariance  matrices  E(v  v.  ) 

k k 

The  completeness  property  of  the  functions  {D^  } assures  us 

mn 

that,  for  any  e>0  and  for  each  component  h*^  of  the  function  h,  there 

exists  an  integer  M.  and  coefficients  h*^  such  that 

J mn 
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M.  n 


|h-^(s)-Z^  h D (s)  I I < e,  j = l,  2,  p. 

i=l  m,n=l  2 


Let  M = max  M.  and  denote  by  hj^(s)  tbe  p-dimensional  vector  whose 

j ^ 


M n , 


'£  ^ j £ A ^ 

jth  component  is  I j;  h D (s)  “S'*  E h D (s)  , 

£ = 1 m,n=l  £=1  m,n=l 


m mn 


Xi  A 

with  h = 0 for  £ > M.  . For  abbreviation,  we  will  denote  the 
m j 

double  summation  notation  by  Z. 

Since  the  function  h is  a mathematical  description  which  is  necessarily 
an  approximation  of  the  physical  phenomenon  that  it  describes,  we  may  as 
well  use  the  equation  = hj^(S|^)  + V|^  to  represent  the  observation 

of  the  signal  s^^. 


Each  noise  vector  has  density 

p(v.  ) = (2Tr)"P^^(det  R.  exp  [-  ^ 2 v^] 

K 2 X K 

where  R^^  has  components  R^^  and  Vj^  has  components  v^  , By  substituting 


'^k  obtain 

p(mj^|s|^)  = (2ti)  P^^(det  R^) 


exp  ^ rJJU;  - E : X rn.J-  E h‘J  D‘__(e|^)  1| 

- exp  (C„  + £ C V„(V 


M 

Z 


n£ 

Z 


n £ 
Z 


£,£'=1  m,n=l  m',n-l 


££  £ ■ I 

C ('n.n.n.’.n’)D^^(s^)D^^^,(s^)} 


Here , 

C.  = -1/2  I 
0 . , k k k 

J = 1 

= 1/2  Z [m^ 

mn  , . , k k mn  k mn 

k,J  = l 

ff  ff  * P 

(m,n,m’n*)  = -1/2  I h h 

, ^ , K mn  m n 

i,j=i 

f 

We  note  that  if  the  product  function,  (s,  ) D^,  , (a,), 

mn  k m n k 

can  be  expressed  as  a linear  combination  of  finitely  many  harmonic 

S . 

functions,  D (s,  ) , on  G,  then  p(m,  s, ) is  an  EFD  of  finite  dimension. 

mn  k k ' k ^ 

. £ 

Fortunately,  this  is  indeed  the  case.  The  product  function  D*  D , , is  a 
■'  ’ mn  m n 

I r ' 

component  of  the  direct  producL,  D x D , of  D and  D , which  is 
itself  a representation  of  G [35,p.79].  As  every  finite  dimensional 
representation  of  a compact  Lie  group  is  equivalent  to  the  direct  sum 
of  a finite  number  of  irreducible  unitary  representations  [36,  p.333], 

fl  f’ 

the  component  D'*^  D , , of  the  finite  dimensional  representation 
nin  m n 

D'^  X is  indeed  a linear  combination  of  finitely  many  's  which 

mn  ’ 

we  write 

"*  " i=l  j,k=l  i'm'n'  °jk« 

The  X's  are  constants  and  M is  the  maximum  superscript  of  all  the  irreducible 

unitary  representations  which  appear  in  the  above  mentioned  direct  suiu 

It  was  shown  in  (35)  that  if  p(Sj^_^|m*^  is  an  EFD(Nj^_^), 

then  p(s^|m*^  is  also  an  EFD(Nj^  . Therefore  if  p(Sj^ 

I k 

an  EFD(N  ),  the  conditional  density  p(s  ]m  ),  which  is  equal  to 
R—  1 R 

Cj^p(Sj^lm*^  ^)p(mjJ  Sj^) , is  an  EFD(max{Nj^_^  M}).  Thus  if  p(Sg)  is 
an  EFD(N)  given  by  (31),  than  p(s  |m^)  will  also  be  an  EFD(max jM,N  | ) 
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for  all  k = 1,2 The  recursive  formulas  for  updating  the  coefficients 

t k 

a^^  can  be  easily  obtained  by  straightforward  but  tedious  calculations  which 
are  omitted  here. 

As  remarked  above,  the  determination  of  depends 

I V 

on  the  decomposition  of  the  direct  product  D x D of  irreducible  repre- 
i V 

sentations  D and  D . Such  a decomposition  is  not  always  easy  but, 
fortunately,  such  decompositions  have  been  thoroughly  studied  and  documented 
for  many  special  groups  including  S0(3).  The  interested  reader  is  referred 
to  [40,  p.  80]  and  [39,  p.  155]  for  further  discussions  and  references, 

V.  5.  An  Example  - Orientation  Estimation  of  a Rigid  Body  Rotation. 

The  state  space  of  a rigid  body  rotation  is  the  three  dimensional  rota- 
tion group  denoted  by  S0(3).  A common  way  to  parametrize  this  group  is  to 
use  the  triple  of  Euler  Angles  0^  ^<2tt,  OS9<TT,OSi1(<2tt. 

Thus,  each  element  of  S0(3)  is  expressed  uniquely  as  the  result  of  a sequence 
of  rotations  through  these  angles  about  the  z - x - and  z - axes. 

We  will  use  a sequence  of  finite  dimensional  unitary  representations 
, 9,  t|t) , I = 0,1,...  j attributed  to  E.  P.  Wigner.  The  components  are 
described  in  [35,  p 144]  by  : 
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mn  mn 

with 

J0  I I ^ 

“ sin"  “e  (l+cosQ)”*  ^ iVn.)j  t (cos0  -D^'^d  + cos0) 

2^[(^-hn)!(je-m)!]^/^  ’ d(cos0)^'^ 


where  m and  n are  integers  such  that  -X_<  m,  n _<  . The  functions 

a 

D form  a complete  orthogonal  system  in  the  space  of  square  integrable 
mn 

functions  on  S0(3)  with  respect  to  the  inner  product 


<fl.  ^2^"  /f j^(8)f2(8)dg 

2tt  TT  2Tr 

— 2 f f f fd4>.0.'l')  ® d((id0dij;. 

Stt  0 0 0 

An  EFD(N)  is  a probability  density  on  S0(3)  of  the  form 
^ I i 

p(<}i,0,ii))  = exp  Z E a D (({>,0,tJj) 

On  0 

Z=0  m,n=- Z 


where  a^^  is  a normalizing  constant.  By  TheoremlO,  any  continuous 

probability  density  function  can  be  approximated  as  closely  as  desired 

by  such  an  EFD(N)  in  the  aforementioned  inner  product  space. 

Let  us  now  consider  the  following  estimation  problem:  The  signal 

process  is  a sequence  of  random  rotations  on  S0(3)  which  satisfies 

= Wj^  o o denoting  product  rotation,  for  some  sequence  of  known 

rotations  w^^.  The  measurement  m^^  is  a concatenation  of  the  signal  s^^ 

and  the  rotational  white  noise  Vj^,  i.e.  ° Suppose  it  is  known 

that  s and  v,  have  EFD(N)'s  which  are  described  by 
o k 


-m 
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” f-  10  I 

p(s  ) = exp  I r a D (s-) 


I =0  m,n=-£ 
N 


mn  mn 


p(v  ) = exp  Z E 

tC  0 r\  0 Old  nid  k 


We  would  like  to  find  the  optimal  estimate  of  on  S0(3)  given  the 
measurements  m'^'  = {m^,  with  respect  to  an  error  criterion 

which  provides  a measure  of  the  deviation  of  the  estimated  orientation 
of  the  signal  rotation  s^^  from  the  orientation  of  the  signal  rotation 


itself . 


Following  the  calculation  in  the  subsection  V.3,  we  can  show 


that  the  conditional  density  p(Sj^|m  ) is  an  EFD(N)  of  the  form 

, I k.  ^ i £k  , V 

p(s,  m ) = exp  Z Z a D (s  ) 
k ' On  ^ mn  mn  k 

*=0  m,n=-*' 

where  the  coefficients  a^^  are  determined  recursively  by  the 

mn 


formulas 

^k 


= Z {a^’‘'"^D^  (w~\)  + (-1)®‘^V.*'  (m,  )}, 


mn  j=jt 


j , -m  j , -n  k 


(32) 


£ 0,  k=l,  2,  .. 


and  aQj  is  a normalizing  constant.  These  formulas  enable  us  to 
calculate  the  sequential  conditional  densities  by  updating  recursively 
a finite  and  fixed  number  of  parameters. 

In  order  to  define  an  error  criterion  for  orientation  estimation, 
it  is  necessary  to  have  a measure  of  the  distance  between  two  orienta- 
tions. We  will  first  describe  such  a measure,  using  quaternions  [41]. 

We  recall  that  a rotation  about  an  axis  in  the  direction  of  a unit 
vector  [X,m,n]'  through  an  angle  ^ is  represented  by  the  (unit)  quaternion 
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^ |>  ^ 4 

q = [q2»q2»‘l3*‘l45 ' “ 2 ’ 2’  2*  2^' 

Given  two  orientations,  the  minimal  angle  in  radians  required  to 
bring  one  into  the  other  is  a natural  measure  of  distance  between  them 
and  defines  a Riemannlan  metric  on  S0(3).  If  the  orientations  are 
represented  by  the  quaternions,  q and  p,  and  the  minimal  angle  is  denoted 
by  p(q»p)»  then  we  have  q'p  = cos  ^ p(q.p).  As  (l-cosp)/2  is  a 
monotone  increasing  function  of  p,  a measure  of  distance  between  p and  q 
can  be  defined  to  be  (1-cos  p (q , p)  y2  = 1- (q ’p)  . It  can  be  shown 
that  if  the  orientations,  q and  p,  are  described  by  the  3 x 3-dimensional 
orthogonal  matrices,  Q and  P,  then  this  measure  of  distance  can  also  be 
expressed  as  (3-tr  PQ')/A. 

We  are  now  ready  to  define  the  error  criterion  for  orientation 
estimation.  Let  q be  a random  quaternion  and  p its  estimate.  Then  a 
measure  of  the  estimation  error  is 

J(q,p)  = E(l-(q'p)^). 

If  the  probability  distribution  of  q is  given,  the  estimate  p 
which  minimizes  J may  be  obtained  from  observing  chat 

J(q,p)  = 1-p'  E(qq')p. 

It  is  well  known  that  the  quadratic  form  p'Vp  of  the  positive  definite 
matrix  V=E(qq')  is  maximized  over  unit  vectors  p when  p is  an  eigenvector 
associated  with  the  largest  eigenvalue  \ of  V.  Moreover,  the  maximum  value 
is  \ . 

Hence, 

A A 

min  J(q,p)  - 1 - q’E(qq’)q 

P 

- 1 - X 


where 
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X = the  maximum  eigenvalue  o£  E(qq') 

q = the  unit  eigenvector  of  E(qq')  associated  with  X. 


Using  the  conditional  density  p(S|^|m  ) that  is  computed 

recursively  through  (32),  the  optimal  estimate  of  the  orientation 

can  then  be  determined  as  follows.  First  compute  the  conditional 

k 

covariance  matrix  E(q (k)q ' (k)  |m  ) where  q(k)  is  the  quaternion  for 
S|^  whose  components  expressed  in  terms  of  the  Euler  angles  are  given 


below: 


q = cos  - cos  ilii 
12  2 


q^  = sin  2 cos 


<P-'P 


q = sin  T sin 

^3  2 2 


q,  = cos  I sin 

^4  2 2 


(33) 


Then  use  some  standard  numerical  method  to  compute  the  largest  eigen- 
value X(k)  and  the  associated  unit  eigenvector  q(kjk).  The  Euler 
angles  (^,9, ♦)  of  the  optimal  estimate  may  then  be  determined  from 

q(k|k)  through  the  equations, 


cose  = 2(q^^  + q^^)  - 1, 

A 1 A A A A 

sin<^  = -j  (q3q^  + q2q^). 

sini()  = _1  (q2q^  - 
A 


with 


0 ^ 0 ^ ir 

A 2^  A A A A 

cos,p  = ^ (q3q2  - q3q4) 

A ^ A A A A 

cosij,  = - (qj^q2  + 


*2  *2  *2  "2 

(qi  + q4  )(q2  + q3  ) 


This  simply  Inverts  set  of  relationships  (33).  The  estimation  error 


is  l-X(k). 
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VI.  DETECTION  FOR  CONTINUOUS-TIME  SYSTEMS  ON  LIE  GROUPS. 

The  idea  of  "rolling  without  slipping"  introduced  in  Section  IV 
will  now  be  generalized  and  used  to  formulate  an  observation  process 
on  an  arbitrary  matrix  Lie  group.  Briefly>  we  will  inject 
the  differentials  of  an  observation  process  described  by  a vector  Ito 
differential  equation  into  a Lie  group  via  the  exponential  map  and  then 
piece  them  together.  The  resulting  product  integral  describes  our 
observation  process  on  the  Lie  group.  The  injected  vector  observation 
process  is  called  its  skew  form. 

The  observation  process  thus  constructed  on  a Lie  group  will  be 
seen  to  satisfy  a bilinear  matrix  stochastic  differential  equation, 
when  its  skew  form  is  linear.  The  observational  noise  can  be  viewed  as 
entering  multiplicatively. 

Given  an  arbitrary  bilinear  matrix  observation  process,  we  will 
show  that  the  corresponding  skew  observation  process  can  be  obtained  by 
"reversing"  the  above  injecting  procedure.  Further,  these  two 
procedures  will  be  seen  to  induce  two  "almost  sure"  bijective  mappings 
between  a vector-valued  and  a matrix-valued  function  space,  one  being 
the  inverse  of  the  other. 

It  is  well  known  that  the  study  of  a Lie  group  may  be  greatly 
simplified  by  considering  the  tangent  space  (the  Lie  algebra)  of  the 
Lie  group  at  its  identity.  In  fact,  the  local  study  of  a Lie  group  is 
entirely  equivalent  to  the  study  of  the  algebraic  structure  of  the 
Lie  algebra.  In  this  paper,  the  above  bijective  mappings  facilitate 
similar  simplification.  It  enables  us  to  evaluate  the  likelihood  ratio 
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I in  a finite  dimensional  linear  space--the  Lie  algebra,' 

In  view  of  the  above  construction,  the  null  and  the  alternative 
hypotheses  that  the  signal  is  respectively  absent  and  present  in  the 
observation  on  a Lie  group  can  be  written  in  terms  of  a pair  of 
bilinear  matrix  stochastic  differential  equations.  Using  the  bijec- 
tive  mappings,  we  may  transform  these  hypotheses  on  a Lie  group  into  1 

j those  on  the  corresponding  Lie  algebra.  There  the  likelihood  ratio 

i 

can  be  expressed  by  the  well-known  formula  in  r43 ] and  [44].  Thus 
the  likelihood  ratio  on  a Lie  group  can  also  be  evaluated  through 

least -squares  estimation. 

i 

When  the  signal  is  a linear  diffusion  process,  the  idea  of  using 
the  bijective  mappings  to  work  in  the  Lie  algebra  also  leads  to  a finite 
dimensional  filtering  equation  for  evaluating  the  least-squares  estimate 
This  equation  is  indeed  an  immediate  extension  of  the  Kalman-Bucy  filter 
to  the  case  of  observation  on  Lie  groups. 


ii 

i 

I 
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vr  1 Almost  Sure  Representation  of  Continuous  Curves  on  Lie  Groups. 

Let  r"’*"  denote  the  set  of  real  n * n matrices  and 

{R,  , . . . ,R  } a basis  of  a Lie  algebra  L In  r'™  . Then  the  set 

1 m 

G = {M  ; M = exp(Aj)  exp(A2)  ...  expCAj^)  ; A^  e L, 
l=l,...,k;  k=0,  1,  ...  } 

is  an  m-dimensional  Lie  group  related  to  L by  a onP~to-one  map  from 

a neighborhood  of  0 e L onto  a neighborhood  of  M e G,  The  map  is 
defined  by 

41  ..(A)  = exp(A)M  , A e L . 

M 

A continuous  curve  in  G is  usually  represented  by  an  mXm -matrix- 
valued continuous  function  on  a closed  interval  T ••  I0,s]  of  the  real 
line.  In  this  section  we  will  show  that,  under  certain  assumptions,  a 
continuous  curve  in  G starting  from  the  identity  element  I e G can 
also  be  represented  by  an  m-vector-valued  function  on  T in  a certain 
"almost  sure"  sense. 

We  will  use  the  following  notation: : 

= the  family  of  continuous  m-vector-valued  functions, 

a , on  T with  initial  value  a(0)  * 0 , 

= the  family  of  continuous  m x m-matrix-valued  functions, 

A , on  T such  that  A(t)  is  in  G for  each  t in  T and 
with  initial  value  A(0)  » I, 


the  Borel  o-field  of  C,  in  the  uniform  topology, 

t 
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■R  “ the  Borel  tf-fleld  of  in  the  unifom  topology, 

S 

w • a standard  ^-vector  Brovmlan  motion  on  a probability  space ^ 


m ^ 

-^Z  ,1^ 


1 J 


i-i 


[ ] "the  Integral  part  of" 

Lower  case  letters  will  denote  elements  In  and  upper  case  letters  will 
denote  elements  In  Cg. 

Let  y be  an  m-vector  stochastic  process  on  T satisfying  the 
following  Ito  differential  equation: 


dy(t)  = f(t)dt  + Q^(t)dw(t)  (345 

% mxp 

where  f Is  an  m-vector  stochastic  process  on  T and  Q ; T R ' Is 
Borel-measurable  and  bounded,  i.e., 

l|Q^(t)ll^  = tr  Q^(t)(Q^(t))’  icl  , for  t e T.  (35^ 

Let  H : C.  C be  defined  by 
n t g 

(H  (a))(t)  = I (t  = 0)  (36) 

n 

= exp[  I (aj(t)  - aja2""))Rj](H^(a))a2"") 

(t  i 0 , 1 = [2"t]) 

for  a = [a^.a^.-.-.a^]  c . 

Let  K(A)  = I yjCA)Rj  = I (y^Ct)  - yja2**'))Rj  and  Y^a)  - Ol^(y))Ct)  . 
Then 

Y Ct)  - Y (£2~")  = (exp(K(A))  - I)Y  (£2"") 
n n n 

Recall  the  following  oscillation  property  P*  57), 
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P t-t2-t^+0 
0sti<t25s 


|y(t2)  - y(tj)| 
C2tlg(l/t))^ 


max  ||Q’(t)| 
t s T 


3 ' 'A  / 3 

It  is  clear  that  up  to  terms  involving  K (A)  (of  magnitude  < 2~^°'  ) , 

Y (t)  - Y^a2"'')  (K(a)  + y K^(A))Y^(i2"")  . 

n n X n 


By  simple  calculations. 


K^(A)  = H Qij(t)R^RjAt 


Thus  the  definition  of  the  Ito  Integral  leads  at  once  to  the 

conjecture  that  Y = lim  Y is  the  solution  of 

n-*-  “ ^ 

dY(t)  = [ I R.dy  Ct)  + M(t)dt]YCt) 
j ^ ^ 

= y I I 


It  is  appropriate  to  remark  here  that  the  sequence  of  operators 

H was  first  devised  in  [{tA]  to  construct  Brownian  motion  on  the  three 
n 

dimensional  rotation  group,  S0(3)  , and  later  used  in  [.AST]  to  construct 
Brownian  motion  on  a Lie  group.  Exactly  the  same  trick  was  used  to  formulate 
an  observation  process  on  S0(3)  in  [A7]-.  In  this  paper,  this  trick  together 
with  some  techniques  developed  in  [A^lwill  be  used,  with  little  modification, 
to  treat  a large  class  of  detection  problems  on  arbitrary  matrix  Lie  groups. 

Following  closely  the  six  steps  taken  in  Section  4.8  of  [451  and 
keeping  in  mind  the  assumption  (35) and  the  oscillation  property  (-IIX 
we  come  to  the  following  conclusions: 
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(1)  There  Is  one  and  only  one  solution  to  OSi 
Cii)  The  solution  can  be  expressed  as  an  almost  surely 
convergent  series  as  follows: 


Y 


0 (t) 
n 


I 0 

Z n 

n ^ 0 

I (iR^dy^Cr)  + M(T)dT)0^_^(T) 


0o(t)  = I 


(111)  The  sequence  converges  uniformly  on 

solution  Y of  (38)  almost  surely.  In  other  words,  (H  (a)} 

n 

uniformly  on  T to  a continuous  function  H(a)  e C for 

o 

of  a Bj^-measurable  set  such  that  = 1 ^ 


T to  the 
converges 
each  element  a 
where  denotes 


the  measure  on  induced  by  y. 


The  operator  H • lim  H Is  the  so-called  product  integral 

n-v  CO 

operator,  which  Is  usually  used  to  solve  matrix  differential  equations 

(see,  e.g. , [46]  and  [49’))  • Its  application  here  to  construct  random  processes 
on  a Lie  group  by  the  use  of  random  processes  on  its  Lie  algebra  yields 
a random  matrix  differential  equation  (38,),  which  is  a global  representation 
of  the  constructed  random  process  on  the  Lie  group  rather  than  usual  local 
representations  for  random  processes  on  differential  manifolds  and  Lie 
groups  (see,  e.g.,  [5Q]  and  [H-]).  This  feature  is  obviously  important 
in  order  to  draw  useful  results  for  engineering  purposes,  as  a sample 
path  of  a diffusion  process  may  possibly  zigzag  across  the  boundary  of  a 
coordinate  neighborhood  infinitely  frequently  over  a fixed  time  interval 
([13],  [14]). 
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In  the  following  we  will  consider  the  converse  problem  of  Inducing  a 
random  process  on  a vector  space  by  a random  process  on  a Lie  group.  More 
specifically,  we  will  construct  the  Inverse  operator,  J , of  H by  defin- 
ing the  appropriate  "inverse"  operator  J of  H . 

n n • 

Let  denote  the  family  of  iwm-  matrix-valued  continuous 

functions  which  are  representations  of  continuous  curves  on  L . 

Let  A e and  n^^  be  the  smallest  Integer  such  that  for  all 
n 5 and  0 5 i ^ Is2"]  , 

||a((1  + l)2"V^(i2"")  - l||  i 1 


Define  K : C -►  C by 
n g m 


CK^(A))(t)  = 0 , Ct  e T)  , (40) 

for  n < n^^  , and 

CK^(A))(t)  = 0 , (t  = 0)  , (41) 

= CK^(A))a2"'')  + lg(ACt)A^^a2^'')) 


(t  5 0 , I = [t2"]) 


for  n 5 n^  . 


Setting  (K^(A))(t)  = ^ Rj  [ CK^^CA))  (t)]j  , we  define 


J : C C„  by 
n g «.  •" 


(J  (A))(t)  -=  I[(K  (A))(t)],,....[(K  (A))Ct)]  ]•  . 

n n X n in 


(42) 


for  A e C . We  will  now  show  that  {J  (A)}  converges  uniformly  on  T 
to  a continuous  function  J(A)  e , for  almost  all  A with  respect  to 
the  measure  on  Induced  by  Y , constructed  previously.  Let 


= (u)  e n 
n 
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|l|Y(12"”,ai)Y'^(i  - 1)2"", 0))  - l||  < 1> 

- {u  e n I ||YCt,u)Y"^a2'”,a))  - ll|  < 1) 

^ i r 

n - n n n n . 

" i=i  " 

Recalling  (37)  and  (38)jlt  can  be  easily  seen  that 


lijn  P(n  ) = 1 . ' 

n+oo  " 


For  notatlonal  simplicity,  we  will  denote  0C^Cy))Ct)  and 

(KCY))(t)  = - ^ r M(s)ds  + r CdYCs))Y~^Cs) 

■’0  ^0 

by  KCt)  , respectively.  Let 

K (t)  = I [-  y M(Ci  - 1)2"")2'° 

° i=l 


+ (Y(12  ")  - YC(1  - 1)2""))  . Y ^(Ci  _ 1)2-")] 

+ [-  YMa2"")Ct  - 12"")  + CYCt)  - Y(£2""))Y~^(£2"")]  . <^3) 

Then 

e1|k  (t)  - K(t)ll^  = E[tr[CK  (t)  - K(t))CK„Ct)  - Ut))']] 
n D n 

$ E||K^(t)  - K^(t)||^  + Ellic^Ct)  - KCt)ll^ 

(44) 


Note  that  K Ct)  ••  0 on  Jl  - f)  by  (.40)  and  the  definition  of  n . Hence 
n n n 

= f llK  (t)  - K (t)|l2dP  + f 1|K  (t)||2dP.  (4^)^ 

Jn  J n-n 


By  the  definition  of  the  Ito  integral 


lira  E |k  Ct)  - KCt) 


(46) 
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It  Is  easy  to  see 

lim  ( I Ik  (t)ll^dP  $ 11m  llK^(t)  - K(t)|l^dP  + 11m  f |lK(t)|l^dP  (47) 
n 


where  the  first  term  on  the  right  hand  side  vanishes  because  of  (46)  ' and  the 

second  term  vanishes  because  llm  P(n  — n ) * 0 . Substituting  (41)  and  (43) 

n-*-  “ " 

into  the  first  term  on  the  right  hand  side  of  (45)  yields 

11k  Ct)  - ic  Ct)ll^dP  S f 11-  f E MC(i  - 1)2"")  (12'"-  (i-  1)2'") 

Ja  1=1 

n ^ 

+ i-  I (Y(12"")Y'\(i  - 1)2"")  - 1)^ 

^ 1=1 


- iM(A2"")(t  - 12’^)  + j (Y(t)Y"^(il2"")  - 1)^1  | dP 


+ f III  I [C(-l)^'Vl)(Y(k2"")Y"^(Ck  - 1)2"*')  - I)^ 

Jn  k=i  n3 


+ I ia-l)^"^/i)CY(t)Y  ^a2"")  - I)*]||^dP  . 
1^3 


With  a view  to (38),  it  can  be  easily  proved  that 


llm 

n-voo 


llK^(t)  - K^(t)l|  dP  = 0 


(48) 


Combining  . (44)  - (48),  completes  the  proof  that 

(K  (Y))(t)  converges  to  (K(Y))(t)  In  quadratic  mean.  Hence  there  is 
n 

a subsequent'^  ^n'}  of  {n}  such  that  with  probability  one 

- i M(s)d8  + f (dY(s))Y"^(s). 

^ h ■'0 


llm  (K  ,(Y))(t)  = (KCY))(t) 


n-*-* 


Then  it  is  easy  to  see  from  (42)£hat  {jj^,(A)}  converges  uniformly  on  T to 
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J(A)  = [I(K  (A))(t)l,,....[(K  (A))(t)]^]*  c C, 

1 ID  X 

for  each  element  A of  a B -measurable  set  B-  C C such  that 

g 2 . g 

= 1 . Furthermore,  J is  Injective  on  B2  because  the  differential 
equation  i;38)wlth  Y viewed  as  the  unknown  function  has  a unique  solution 
so  that  K(Aj^)  = KCA^)  Implies  that  “ A2  , In  view  of  (49)  . 

Comparing  (49)  with  (38), we  see  that  for  each  a e B^^  , H(a)  Is  an 
element  of  ®2^®1  — J(H(a))  = a (J  = H ^)  . We  now  come  to 

the  conclusion  that  almost  all  continuous  curves  in  G with  respect  to 

can  also  be  represented  by  continuous  m-vector-valued  functions  on  T . 
Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 

Theorem  j..  Let  y be  the  solution  of  (34)  and  let  H„;C,^C  and  J -C  -^C 

n ^ u^ana 

be  defined  by  (36) and  (421  Then  [Hn(y)j  converges  to  y=H(y)  with 
probability  one,  and  y satisfies 

dY(t)  -t^R^dy^Ct)  + l/2^^Qij(t)R^Rjdt]  Y(t) 
with  initial  value  yo)=I.  Conversely,  converges  to  y=J(Y') 

with  probability  one,  and  y satisfies  the  above  equation  too.  In  fact, 
almost  surely. 


VI. 2.  Hypotheses  on  Lie  Groups  and  Evaluation  of  Likelihood  Ratios. 

The  (almost  sure)  bijection  constructed  in  the  previous  section  will 
be  used  to  formulate  a detection  problem  on  a matrix  Lie  group  G and  to 


derive  a likelihood  ratio  formula  as  a function  of  the  updated  observation. 
Let  us  first  write  down  a pair  of  hypotheses  on  the  Lie  algebra  L of  G in 
the  form  of  m-vector  Ito  differential  equations: 

'I 

|l 
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1 


Let  the  measures  on  (C  ,6  ) Induced  by  T under  R.  and 

g g T.  0 

be  denoted  by  and  Vq  respectively.  From  the  measure-theoretic  view- 

point, the  detection  problem  Is  to  evaluate  the  Radon-Nlkodym  derivative 

dv,/dv„  In  (C  ,8  ) , assuming  it  exists. 

■i  0 g g 

Let  the  measures  on  induced  by  y under  and 

be  denoted  by  and  respectively.  We  note  (IA3]  , [44])  that 

fa  _i 

if  m'Q  mdt  < ” , a.s.,  (P)  • then 


fa 

If  m'Q  mdt  < ” , a.s.,  (P)  • then 

Jo 

dp.  if®.  f®  _i 

(y)  = exp{-  j J m'(t)Q  (t)m(t)dt  + j-  m'(t)Q  Ct)dyCt)J 


a.s.  (P)  (54) 


where  mCt)  = E(jnCt)  ly*” ,H^j^)  , y*  is  the  restrictions  of  y to  [0,t] 
As  Y * H(y)  and  H is  almost  surely  htjectlve,  we  may 
anticipate  that  dv^^/dv^  is  equal  to  dp^^/dp^  after  some  change  of 
variables.  This  is  indeed  the  case. 


Lemma  5.  Let  6^^  and  6^  be  any  random  objects  taking  values 
in  the  same  measurable  space  (0,0,^)  , 0^  being  a a-field  in  0 . 

Let  f be  a measurable  mapping  from  C0,0*)  into  another  measurable 
space  (A,A^)  and  = f(6j^)  and  X2  * measures  on 

(0,0*)  and  (A, A*)  induced  by  0^  and  X^  be  denoted  by  and  > 

respectively,  for  1 = 1 and  2 . If  <<  ^2  t then  <<  n2  and 

dq-. 

^(A)  = ECj7^  cT(f))a)  , a.s.  (n,). 

“n2  °^2  ^ 

where  a(f)  denotes  the  a-subfleld  of  0*  generated  by  f . 

If,  in  addition,  f is  bljective  a.s.  ^2^  * then 
dHi  dt 
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Thls  lemma  is  an  Immediate  extension  of  Lemma  3,  p.  99  In  [52]. 


A proof  can  be  found  In  ['^7]. 

In  view  of  this  lemma.  It  Is  now  easily  seen  that  (dv^/dv^) (Y) 

Is  equal  to  the  right  side  of  (54)  with  y*"  replaced  Ijy  H “ J(Y^), 

being  the  restriction  of  Y to  tO,t]  . 

Let  e^j  be  the  m x m-raatrlx  of  which  the  (i,j)  component  Is 

2 

one  and  the  other  components  are  zero.  Let  {R  , j = 1 , . . . ,g  - m} 

IBtj 

f 2 ■» 

be  g X g-matrlces  such  that  tR  , j = l,...,g  } form  a basis  of  R . 

2 ^ 

8^  k k 

Now  we  may  write  e . . = L R,e..  , for  some  constants  {e.^}. 

^ ij  k=l  ^ ij  ’ ij 

Let  e = be  the  matrix  of  which  the  (i,j)  component  Is  e^  (55) 

k 1 J 


since 


■ft  ft  1 

(dY(T))Y  ^(t)  - M(T)dT  belong 

LiO  JQ 


s to  L which  is  spanned  by 


^Rj^, . . . , we  have 


ft  _i  ft 

J*. I (dY(T))Y  (t)  - MCT)dT]}  = 0 , for  J > m . 
J JQ  ■'O 


From  (38),  It  can  be  shown  by  simple  calculation  that 


y(t)  = (J(Y))(t)  = [trfeM  (dY(T))Y 


trfeM  f*"  (dY(T))Y"^CT)  - f*" 

•’o  Jo 

tr{e'[  f*"  (dY(T))Y"^(T)  - T M(t 

Jj  Jo 


MCT)dT]}  , 


M(T)dT]}]' 


where  M is  defined  by  (39)» 

Now  we  note  that  E(m(t)|jCY  ” E(mCt)  Iy*" , since  J 

is  bijectlve  and  Y^  and  iCY*”)  generate  the  same  a-subfield  of  A . 

Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 

Theorem  13.  Given  a matrix  Lie  group  G , we  can  formulate  a 
detection  problem  on  It,  which  is  described  by  the  bilinear  matrix  Ito 


r 
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equations,  (52).  and(52)K  If  m'Qmdt  < - , a.s.  (P)  , the  likelihood 


ratio  can  be  written  as 

dv. 


dv^ 


(Y)  » exp 


{ _ i P Q~^(t)m^dt  + I £ Q"^(t)dyCt)} 

2 Jo  ^ h ^ 


dyCt)  - [tr  { e^[(dY(t))Y"^(t)  -M(t)dt]} 


tr  { e‘l(dY(t))Y"^(t)  -MCt)dt]}]’ 
ID 


= ECmCt)lY‘^  , H^) 

M(t)  = I I I Qi4Ct)RR 
^ i=l  1=1  J ^ 


and  e^  is  defined  by  (55), 


VI.  3.  Detection  for  Bilinear  Systems . 


In  the  previous  section  we  have  shown  how  a detection  problem 

on  a matrix  Lie  group  can  be  formulated  if  we  are  given  the  matrix  Lie 

group.  We  treated  a class  of  detection  problems  by  starting  with  a 

representation  and  injecting  it  to  obtain  a C representation.  Moti- 

S 

vated  by  the  existence  of  the  bijective  operator  discussed  in  subsection  VI. 1, 
we  ask  if  we  can  reverse  this  process.  I.e.,  can  we  start  with  a 
representation  and  obtain  a representation?  In  this  section  we  answer 

that  question  in  detail  and  thereby  treat  a large  class  of  bilinear  detec- 
tion problems. 
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In  fact.  In  view  of  the  bilinear  form  of  the  hypotheses  and 
Ho  , a natural  question  arises  as  to  the  detection  for  problem  arbitrary 
bilinear  g x g-matrix  Ito  differential  equations  of  the  following  form: 


Y(0)  = I 
We  assume: 


(57) 


(58) 
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For  simplicity  In  Che  Illustration  of  our  approach, 
and  are  constant  matrices, 

II.  1 ■ l,...,a}  are  linearly  Independent, 

III.  w » [Wj^, . . . ,Wg] ' is  a standard  S-vector  Wiener  process, 

IV.  m » [mj^, . . . ,m^] ' Is  a measurable  a-vector  stochastic  process 

a rs  2 

such  that  J I m. (t)dt  < * , a.s.  (P)  , 

1*1  JQ 

V-  9 * * Is  a measurable  yvector  stochastic  process 

such  that  J I q?(t)dt  < “ , a.s.  (P) , and  qCt)  Is 
1=1  ■'0  ^ 

Y^-measurable , 

VI.  w is  Independent  of  m . 

Comparing  these  hypotheses  with  those  defined  by  (52)  and  f53) , 
we  anticipate  a solution  similar  to  that  in  the  previous  section.  We 
notice  that  If  the  solution  Y to  both  ^57)  and  (5'8)  is  on  a matrix  Lie 
group  (Y(t)  is  then  nonsingular),  and  m and  w enter  the  observation 


via  the  Lie  algebra  then  an  almost  sure  bijection  J exists  which 

transforms  Y into  a causally  equivalent  vector  process  . The 

detection  problem  at  hand  is  then  readily  solved. 

Let  L denote  Che  Lie  algebra  generated  by  (A, , . . . ,A_,B, , . . . ,B.}. 

that  t.  la  an  n-dl  mens  tonal  linear  apace  of  which  (R,  ,...,R  ) 
fi  In 
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h k 

. Let  Q denote  the  n 8 matrix  , • 

Assume  that  the  Lie  algebra  L generated  by  8^^,..., 

j"  Is  an  m-dlmenslonal 

linear  space.  In  view  of  the  results  In  Section  2,  we  may  presume  that 
a skew  observation  JCY)  in  the  form  of  (50)  exists  and  write 


^ dt  + ^^dw  , 


where  the  nr-vector  stochastic  process  ^ and  the  mx  8-matrix  2. 
to  be  determined. 

Substituting  (60)  into  (18)  we  obtain 
f = m + r 


m=  [mj^, . . ,m^,0, . . . ,0] ' 


r = [r^.....rj- 

y n 8 m 

I C q - 2 Z I = I 

1=1  ^ ^ ^ 1=1  j-1  ^ J 1=1  ^ ^ 


k k 

(Q^j  Is  the  (i,j)  component  of  the  n x n-matrix  Q = Q CQ  ) *;  in  general, 

* (Qtj)') 


S.  = Q 


t n 


J m-n 


» jr  N(T)  K •etiefti 


I 


• 
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and  sliiil''arly  2.  “ J(Y)  under  Hq  satisfies 

: d^  = 5_^dw  + £ dt  . 

This  shows  that  the  hypotheses,  (57)  and  (58),  are  interpretable 

on  the  Lie  group  G of  L . Therefore,  Y(t)  is  invertible  and, 

by  Theorem  1,  y_  and  Y are  causally  equivalent. 

Let  C and  B be  defined  as  in  the  previous  sections,  and 
g g 

let  the  measures  on  (C  ,B  ) Induced  by  Y under  H,  and  H_  be  denoted 

g g 10 

by  and  Pq  respectively.  In  the  following,  we  will  evaluate  the 

Radon-Nlkodym  derivative  dp^^/dp^  , assuming  it  exists. 

Let  C (k=a  or  k = m - a ) denote  the  family  of  continuous 

k * k-matrix-valued  functions,  A , on  T with  Initial  value  A(0)  » I , 

k k 

and  let  8 denote  the  Borel  o-field  of  C 

Let  y = . . . ,y^] 'and  let  be  the  measure  on  (c'*,B'^)  induced 

. . 2 


by  y under  H.  . Let 


,y^]  'and  let  v“  be  the  measure  on 


(C™~^,B"'  **)  induced  by  z under  . Then  the  measure  on  (C  ,B”') 

1 2 

Induced  by  y under  is  equal  to  t^e  product  measure  It  is 


d 

2 2 1 
easy  to  see  that  v = and  thus  — 2 


= 1 . Using  a well-known  lemma 


(Lemma  2,  p.  99  In  [ 52]),  we  have,  for  t e T , 


1 2.  ,1 

dCVi  X vp  dv^ 

T T*  ^ = — Cy  ) 

dCv  X Vq)  dVQ 


dv 


dv: 


1 r tx 

o Cz  ) 


dv 


dv; 


1 

Cy  ) , 


(62) 


dv 


I 


provided  . 

dvi 


exists. 


• • wi  • Is 


» ^ - 


»->i  -•isf  si 
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We  note  that  y under  satisfies 

: dy  <■  mdt  + I°Q^dw  + rdt 
Z1  n 

while  y under  satisfies 

^20  ■ ' I^Q^dw  + rdt  , 

ot  Cl  n 

where  r = I r . If  det(I  QL ) ^ 0,  it  is  known  that 

n»—  n u. 

the  likelihood  ratio  of  to  can  be  written  as; 

dv|  fS  , fS 

— Y Cy)  = expl  - j m’(t)(l“Ql“)  ^n<t)dt-  m(t) (l“Ql“)"^r Ct)dt 
dvj  ^ •'0  “ ® h “ “ 

+ £i'(t)aX)-V(t)i 

where  in(t)  = E(m(t)  |y  note  that  the  assumption  that  q(t)  is 

Y*'-measurable  is  used 


Since  J is  bijectlve,  we  have,  by  Lemma  5, 


dp  d(vj  X v^)  dv^ 

— (Y)  = ± ^ (J(Y)) f (J(Y))  (64) 

‘^^''o  ’*  ''O^  ‘*''0 


Let  be  defined  as  in  (55)  and  write  J(Y)  as  (56). 

Substituting  (bl)  into  (64)  then  leads  to  the  lollowing  thenresi. 


I,  tHe  rw>>  Hr^ttbeaea  Nj  4ea>  t Iba^ 


• •••  • 


% 9 * • * • 
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(Y*^)  = exp[-l/2  ds  - / m' (l“Ql"')"^r(s)ds 

— „sma  s sma 

d Vq  0 'J 


where 


t o , 

+ -f  m’(l“Ql'")"^dy(s)] 
Q s m a 


E(m(s)  1y®,H^) 


dy(s)  = [tr(e|l(dY(s))Y“^(s)  - Mds  1 tr  {e^(  (dY(s ) )Y"\s  )-Mds ))  1 ’ 


1 


tl 

M - Z J:  Q.  .R.R. 

2 i_i  j_i  ^ J 

and  e,  , and  r (=l°‘r)  are  defined  by  (62),  (55)  and  (61), 

m k 

m 

respectively. 


VI.  4.  Least-Squares  Estimation. 

In  view  of  Theorem  13  and  Theorem  14,  it  is  noted  that  the 
evaluation  of  the  likelihood  ratio  depends  on  the  evaluation  of  the  condi- 
tional expectation  m^  = E(m(s)|Y  ,Hj^).  We  recall  that  under 


6 


dY  = A.m.dt  -b  B^dw.  -h  c.q^dtJY 


Y(0) 


(65) 


The  evaluation  of  m is  thus  a nonlinear  filtering  problem, 
s 

In  this  section  we  will  use  the  transformation  techniques 

In  the  K'*vious  ■nitons  to  solve  this  lllterlng  pritfelem 


I*  aW 


'■If'  • 'si. 


(«*) 
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dx(t)  = F(t)x(t)dt  + G(t)dv(t) 


x(0) 


X 


0 


(67) 


Here  x(t)  is  a random  vector,  H(t),  F(t)  and  G(t)  are  matrices  of 
appropriate  dimension,  v is  a standard  Wiener  process,  x^  is  a normal 
random  vector  and  w,  v,  x^  are  statistically  independent. 

It  has  been  seen  in  the  previous  section  that 
E(m(t)|YSH^)  = E(m(t)  = E(m(t)  | y^H2^) , a.s.  (P).  (68) 

Let  denote  the  family  of  a-vector-valued  continuous  functions  and 

its  Borel  o-field  in  the  uniform  topology.  Then  there  exist  a 

B -measurable  functional  f , :C  and  a B*^-measurable  functional 

g 1 8 

such  that 

f^(Y‘^)  = E(m(t)|Y‘^,H^),  a.s. 

E(m(t)  |y*^,H2j^)  , a.s. 

and 

f,(Y*^)  = f.(l“H(Y’^)),  a.s.  (69) 

1 j n 

where  i“h:C®-K:“  is  defined  by  (I°‘H(Y))(t)  = l“[  (H(Y) ) (t)  ] = y(t). 
n n n 

t ^ t ^ 

In  the  following  we  will  denote  ^^^(Y  ) by  and  f2(y  ) by  m(t). 

We  note  that  this  convention  is  consistent  with  the  previous  sections. 

Under  the  assumptions\66)  and  (67)  it  is  well  known  that  m(t) 
satisfies  the  following  Kalman-Bucy  filtering  equations, 
m(t)  - H(t)x(t) 

di(t)  - F(t)i(t)dt  ["“rdt  ♦ P(t  )H' (t)  (l‘*gi"')"^dy  (t  )-H(t)x(t)dt) 

n I 

Pd)  • FHiPd)  ♦ r(t  )F’{t)  p(t)M'(t)(i^gr*)‘*M(t)rd)  ♦ «;(i)g'(i) 


(70) 
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i 


Using  these  equations  together  with  (68)  and  (69),  we 
obtain  the  following  theorem. 

Theorem  15.  Let  the  message  process  m and  the  observation  process 

Y be  as  described  by  the  equations,  (66),  (67),  and  (65).  Then  the 
conditional  expectation  m^  satisfies 
m^  = H(t)x^ 

dXj.  » F(t)x^dt  + rdt  + P(t)H'(t)(I^QI™)~^(dy(t)-H(t)X|.dt) 
dy(t)  = [tr{e|l (dY(t)Y  ^(t)-Mdt] }, . . . , 
tr{eM(dY(t))Y"^(t)  - Mdt]}]', 

where  P,  e , r,  M,  Q are  determined  by  (70),  (62),  (55),  (61), 
m R 

(39),  (59),  respectively. 


r 
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